Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-06 Thread Daniel Kahn Gillmor
On Fri 2017-10-06 10:00:27 +0100, Chris Lamb wrote:
> If it were hardcoded into the filenames, one wouldn't need to do
> anything onerous, eg.
>
>   -rw-r--r-- 1 0 Oct  6 09:56 
> helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.class
>   -rw-r--r-- 1 0 Oct  6 09:56 
> helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.clj
>
> (Not entirely serious)

ah!  i hadn't even thought of that :)  I wonder whether any language
would consider such a construct.

> Just to underline, Python in Debian would not be a problem even with <
> unless you consider building a .deb with SOURCE_DATE_EPOCH="$(date +%s)"
> and installing that very same .deb within same second...
>
>  … but I understand you were being more general about this topic!

yep, exactly -- i'm not saying that python is broken in debian, just
citing it as an example of another language that does the same kind of
thing, similarly to elisp, etc.

   --dkg



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-06 Thread Chris Lamb
Hi dkg,

> And there are more questions too: what if multiple source files
> contributed to the creation of the compiled artifact (e.g. "include"
> directives)?

Hm, that's an excellent point.

> You can also imagine a compilation regime that detects changes to a file
> (e.g. via inotify) and immediately triggers recompilation -- with a fast
> compiler and a coarse filesystem/archive timestamp, such a regime would
> end up in the same situation (serious performance impact).

Sure, but that doesn't seem like it would happen as part of a package
build?

> There are also problems with the digest based approach that lamby
> suggests: it's significantly more expensive to do a full source
> extraction and digest than it is to compare timestamp metadata.

If it were hardcoded into the filenames, one wouldn't need to do
anything onerous, eg.

  -rw-r--r-- 1 0 Oct  6 09:56 
helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.class
  -rw-r--r-- 1 0 Oct  6 09:56 
helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.clj

(Not entirely serious)

> It sounds to me like python has made a sensible tradeoff (accepting that
> equal timestamps means OK)

Just to underline, Python in Debian would not be a problem even with <
unless you consider building a .deb with SOURCE_DATE_EPOCH="$(date +%s)"
and installing that very same .deb within same second...

 … but I understand you were being more general about this topic!


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-05 Thread Daniel Kahn Gillmor
On Thu 2017-10-05 10:56:46 +0100, Chris Lamb wrote:
> I'd also be curious to know why you think *more* than one second could
> ever be needed here. I think I'm mising something.

some filesystems have a resolution > 1s :(

  http://www.ntfs.com/exfat-comparison.htm

shows that FAT32 has a 2s granularity when used without extensions.
Looks like the Linux kernel remembers a 1sec granularity while still
mounted, but shows just the 2sec granularity across remounts:


   mkfs -t vfat $blkdev
   mount $blkdev /mnt
   for a in 1 2 3; do
  touch /mnt/$a
  sleep 1
   done
   stat /mnt/* | grep Modify
   umount /mnt
   mount $blkdev /mnt
   stat /mnt/* | grep Modify
   umount /mnt


produces two batches of mtime stats:

Modify: 2017-10-05 12:56:14.0 -0700
Modify: 2017-10-05 12:56:15.0 -0700
Modify: 2017-10-05 12:56:16.0 -0700

Modify: 2017-10-05 12:56:14.0 -0700
Modify: 2017-10-05 12:56:14.0 -0700
Modify: 2017-10-05 12:56:16.0 -0700



  --dkg



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-05 Thread Daniel Kahn Gillmor
On Wed 2017-10-04 19:45:49 +0100, Chris Lamb wrote:
> *Very* quick thoughts here: could some variant of a) be merged
> upstream…? Perhaps upstream could move to a hash-based system instead
> of using timestamps? eg. encoding the SHA1 of the file in the filename.

I'm thinking about this problem more generally than clojure
specifically -- other folks have raised python's .py → .pyc mappings and
i'm sure there are other similar frameworks.  I want to make sure we're
thinking about the various places that these checks happen.

It may also matter whether we're talking about file stored in an archive
vs. one stored in the filesystem.  different archive formats and
different filesystems have different timestamp granularity (iirc, FAT
has 2s granularity, for example).

And there are more questions too: what if multiple source files
contributed to the creation of the compiled artifact (e.g. "include"
directives)?

You can also imagine a compilation regime that detects changes to a file
(e.g. via inotify) and immediately triggers recompilation -- with a fast
compiler and a coarse filesystem/archive timestamp, such a regime would
end up in the same situation (serious performance impact).

And of course, it's always possible to (accidentally or intentionally)
just "touch" the timestamps on a totally different bytecode file of the
appropriate name to trick or confuse this optimization step.

There are also problems with the digest based approach that lamby
suggests: it's significantly more expensive to do a full source
extraction and digest than it is to compare timestamp metadata.

--

So i think we have to ask what the goal of this check is from the upstream
platform's point of view:

 * is it strong assurance that the file was built from the
   exposed source?

 * is it a speedy (if fallible) sanity check?

i think that it can't really be the former (because of all the corner
cases outlined above), so the question is what kind of failure modes and
risks they're willing to tolerate.  Those that want absolute assurance
will be obliged to recompile each time unless they have some sort of
externally-audited mapping/manifest.

It sounds to me like python has made a sensible tradeoff (accepting that
equal timestamps means OK) and clojure has made a decision that tries to
get more of a guarantee than they can actually get, and sacrificed
performance for it.

--dkg



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-05 Thread Rob Browning
Chris Lamb  writes:

> I don't quite get what you mean I'm afraid. Filesystem ordering (at least
> via readdir/listdir, etc.) is non-deterministic. Can you explain it to me
> another way?

(...or quite likely I'm not describing things all that well.)

In Clojure's case, I'd think that setting the .clj mtime to at least 1s
before the corresponding .class file in the jar should work fine, though
if Clojure's only consulting the jar, then any other offset that
registers as smaller should also work, i.e. it might not have to be a
full second inside jars.

But sticking with at least 1s should make things a bit more general
because then if you

  jar xf foo.jar

the resulting tree will still show the right relative offsets on common
filesystems (assuming "jar x" tries to preserve mtimes) so that any
tool, clojure, some clojure build tool, etc. will still work as expected
with the tree.


...then I started thinking more generally and wondered if (eventually)
we might be able to do something even more broadly helpful.

If we were to take any archive we're rewriting (tar, jar, cpio), and
sort all the files by decreasing mtime, then assign the set of files
with the largest mtime to have some mtime_0, assign the set of files
with the second largest mtime to have (mtime_0 - 1s), the third set to
(mtime_0 - 2s), etc., we'd preserve the overall ordering among the
files so that something like:

   tar xf some-reproducible-archive.tgz
   cd some-reproducible-archive
   make

would stand a good chance of just working as it would have with the
original archive.

> I'd also be curious to know why you think *more* than one second could
> ever be needed here. I think I'm mising something.

I suspect 1s is just fine, and I have nothing concrete in mind here --
it just made me think of the general floating point issues (if any end
up involved in the path), e.g. 4.000...1 vs 4 vs 3.999... vs
rounding/truncation to the final value, etc.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-05 Thread Chris Lamb
Hi Rob,

> Or rather, if Clojure's only looking at the timestamps in the jar file,
> then those may have a known (fixed) resolution, and so we'd just need to
> make sure that the .clj files are at least that much older than the
> corresponding .class files inside the jar.

Right; that's:

> >  b) We make strip-nondetermism subtract 1 second from the .clj files'
> > target modification times so it matches with the existing ">".

.. is it not? :)

> Though I'd probably still pick 1s or more just so that an unpacked jar
> will still have the right timestamp ordering on the vast majority of
> filesystems.

I don't quite get what you mean I'm afraid. Filesystem ordering (at least
via readdir/listdir, etc.) is non-deterministic. Can you explain it to me
another way? I'd also be curious to know why you think *more* than one
second could ever be needed here. I think I'm mising something.


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-04 Thread Rob Browning
> Chris Lamb  writes:


>>  […] assumes a filesystem with 1s mtime resolution.

> Mmm, which is a completely fair assumption. See also:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=804339 !

While I did mention filesystem timestamps on IRC as an example, and they
are relevant for say make, do they matter here?

Or rather, if Clojure's only looking at the timestamps in the jar file,
then those may have a known (fixed) resolution, and so we'd just need to
make sure that the .clj files are at least that much older than the
corresponding .class files inside the jar.

Though I'd probably still pick 1s or more just so that an unpacked jar
will still have the right timestamp ordering on the vast majority of
filesystems.

Or perhaps we're not (re)building the jar(zip) manually, but building a
new one after round-tripping the files through the current filesystem?

...in which case perhaps an offset of a second or more is still
sufficient.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-04 Thread Phil Hagelberg
Hi; I'm the upstream maintainer of Leiningen, a Clojure application
being packaged for Debian.

I would strongly vote for adjusting the timestamps of .clj files to be
older than the corresponding .class files.

I don't know enough about filesystem timestamp granularity to comment on
the wisdom of >= vs >, but I do know that patches to Clojure from
outsiders (myself included) often take years to get applied (if ever)
and the value of maintaining compatibility with older versions of
Clojure shouldn't be underestimated.

Users of Leiningen will pull in whatever version of Clojure is specified
by their application (usually not the same one as is packaged by
Debian), and if jars from the Debian repository end are packaged with
the assumption that they are consumed with a >=-patched Clojure, this
will cause a lot of subtle confusion.

-Phil


signature.asc
Description: PGP signature


Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-04 Thread Chris Lamb
Hi Elana,

> Hi all, just catching up on this thread.

No problem, great to see more people adding their thoughts! :)

>  […] assumes a filesystem with 1s mtime resolution.

Mmm, which is a completely fair assumption. See also:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=804339 !

> Having chatted with Phil Hagelberg (author of leiningen) as well, he 
> suggests we go for solution b) or something similar, as he believes this 
> to be a packaging concern as opposed to a core language problem.

In the abstract, I would agree; it is "just" a packaging problem that
we've caused ourselves.

However, do we really want to maintain a list of ".class" → ".clj"
mappings to hack around, essentially forever? :)

Further to this, in an ideal world, strip-nodeterminism should (and will
not!) exist. Indeed, I love to *remove* handlers/features from it as they
get merged upstream.

*Very* quick thoughts here: could some variant of a) be merged
upstream…? Perhaps upstream could move to a hash-based system instead
of using timestamps? eg. encoding the SHA1 of the file in the filename.


Best wishes,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-04 Thread Elana Hashman
Hi all, just catching up on this thread.

FWIW I agree with Apollon; as rlb pointed out on IRC, we introduce a 
potential race condition when we don't recompile when the timestamps are 
equal. Quoting him...

The scenario I was thinking of is "clock ticks over to 1001s, I 
compile foo.clj -> foo.class, then I edit foo.clj, then clock ticks 
over to 1002s, and we make a jar", but the filesystem says both the 
.clj and the .class are mtime 1001s even though foo.clj is 
different. This example assumes a filesystem with 1s mtime 
resolution.

Unlikely for a human editing files, of course, but could be problematic 
with e.g. automated build processes.

As such I don't actually know if Clojure upstream would be willing to 
accept the patch. I can submit just to see what they say? I'm honestly 
not sure they'll consider this a bug.

Having chatted with Phil Hagelberg (author of leiningen) as well, he 
suggests we go for solution b) or something similar, as he believes this 
to be a packaging concern as opposed to a core language problem.

- e



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-04 Thread Chris Lamb
Hi Vincent,

> FWIW, I agree with Apollon. >= is better than > as the resolution of the
> timestamp can be coarced.

Mm, I agree. Also, as strip-nondeterminism should really "go away" in the
medium- to long- term, I'd rather avoid adding ad-hoc modifications
(especially ones so ugly) for each language environment that can suffer
this issue.

> I am also worried Clojure may not be the only one using this.

I've just posted to -devel on this topic so this gets more exposure:

  https://lists.debian.org/debian-devel/2017/10/msg00073.html
 

Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-04 Thread Vincent Bernat
 ❦  4 octobre 2017 08:03 +0100, Chris Lamb  :

>> Fixing our clojure package would only solve the issue for Debian.
>
> Sure, but why don't we patch Debian's version of clojure whilst we wait
> for upstream to "catch up"? :-)

FWIW, I agree with Apollon. >= is better than > as the resolution of the
timestamp can be coarced. I am also worried Clojure may not be the only
one using this. For example, Python may use the same thing for pyc (just
checked, it doesn't, it uses not(>=)). We don't ship pyc in packages,
but there may be other things like that.
-- 
Make sure comments and code agree.
- The Elements of Programming Style (Kernighan & Plauger)


signature.asc
Description: PGP signature


Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-04 Thread Chris Lamb
Hi Emmanuel,

> Fixing our clojure package would only solve the issue for Debian.

Sure, but why don't we patch Debian's version of clojure whilst we wait
for upstream to "catch up"? :-)


Best wishes,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Emmanuel Bourg
Le 3/10/2017 à 19:49, Chris Lamb a écrit :

> So, did you see Apollon's remarks to this bug?

I guess the messages were delayed, I didn't see them yesterday.


> Very happy to rollback the changes to strip-nondeterminism that
> implement b) if we go with a) in the end; I haven't uploaded yet.
> 
> Can we come to some conclusion here? :)

Fixing our clojure package would only solve the issue for Debian. If
strip-nondeterminism is also meant to be used outside Debian it's
probably worth keeping the tweak for .clj files until upstream addresses
the issue (at least for resources in jar files I think).

Emmanuel Bourg



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Chris Lamb
Hi Emmanuel,

> >   a) We patch clojure with ">="  (and send it upstream, etc. etc.)
> > 
> >   b) We make strip-nondetermism subtract 1 second from the .clj files'
> >  target modification times so it matches with the existing ">".
> 
> I thought about b) too but this is definitely a clojure bug.

So, did you see Apollon's remarks to this bug? You seem to
disagree on where the bug is.

Very happy to rollback the changes to strip-nondeterminism that
implement b) if we go with a) in the end; I haven't uploaded yet.

Can we come to some conclusion here? :)


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Emmanuel Bourg
Le 3/10/2017 à 10:32, Chris Lamb a écrit :

> Great stuff! So, we have two options as I see it:
> 
>   a) We patch clojure with ">="  (and send it upstream, etc. etc.)
> 
>   b) We make strip-nondetermism subtract 1 second from the .clj files'
>  target modification times so it matches with the existing ">".
> 
> My preference is for "a)", naturally...

I thought about b) too but this is definitely a clojure bug.

Emmanuel Bourg



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Chris Lamb
Hi Apollon,


> Thanks for fixing this. Just a small comment: the comment in [1]
> should probably say "to always be older than .class", instead of
"to always be younger".

Good idea; pushed in:

  
https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=cb9261d05e6891153f3d44ad2cc6c0e3184dbc60


Best wishes,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Apollon Oikonomopoulos
Hi Chris,

On Tue, 03 Oct 2017 15:00:11 +0100 Chris Lamb  wrote:
> tags 877418 + pending
> thanks
> 
> > Setting the mtime of .clj files one second earlier than .class
> > should Do The Right Thing™.
> 
> Thanks. I've just pushed the following:
> 
>   
> https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=99af63bec965d924275d53f4db90f9853e4db8a7
> 
>   
> https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=3f92d1b3d5cfc7b9b82cec176b3e602d0a34fbaf
> 
>   
> https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=dec86231ce51db87d28db35fbedb9c887db569fd
> 
>   
> https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=7691e2980274c1b041b8730bae5a8f5374cbcbf1

Thanks for fixing this. Just a small comment: the comment in [1] should 
probably say "to
always be older than .class", instead of "to always be younger".

Cheers,
Apollon

[1] 
https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=7691e2980274c1b041b8730bae5a8f5374cbcbf1



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Chris Lamb
tags 877418 + pending
thanks

> Setting the mtime of .clj files one second earlier than .class
> should Do The Right Thing™.

Thanks. I've just pushed the following:

  
https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=99af63bec965d924275d53f4db90f9853e4db8a7

  
https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=3f92d1b3d5cfc7b9b82cec176b3e602d0a34fbaf

  
https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=dec86231ce51db87d28db35fbedb9c887db569fd

  
https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=7691e2980274c1b041b8730bae5a8f5374cbcbf1


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Apollon Oikonomopoulos
Hi Chris,

On Tue, 03 Oct 2017 09:32:39 +0100 Chris Lamb  wrote:
> Hi Emmanuel,
> 
> > I eventually found this check performed in the load() method of RT.java:
> > 
> >   if((classURL != null &&
> >   (cljURL == null
> >  || lastModified(classURL, classfile) > lastModified(cljURL,
> > scriptfile)))
> > 
> > Changing '>' with '>=' fixes the issue.
> 
> Great stuff! So, we have two options as I see it:
> 
>   a) We patch clojure with ">="  (and send it upstream, etc. etc.)
> 
>   b) We make strip-nondetermism subtract 1 second from the .clj files'
>  target modification times so it matches with the existing ">".
> 
> My preference is for "a)", naturally...

I'm afraid a) is not the correct solution here. If you want to make sure 
that the bytecode is strictly newer than the source, you *have* to 
re-compile if they have the same mtime. This is especially true when 
taking into account that the mtime resolution is finite (and pretty 
coarse indeed in cases like ext3). Setting the mtime of .clj files one 
second earlier than .class should Do The Right Thing™.

Cheers,
Apollon



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-03 Thread Chris Lamb
Hi Emmanuel,

> I eventually found this check performed in the load() method of RT.java:
> 
>   if((classURL != null &&
>   (cljURL == null
>  || lastModified(classURL, classfile) > lastModified(cljURL,
> scriptfile)))
> 
> Changing '>' with '>=' fixes the issue.

Great stuff! So, we have two options as I see it:

  a) We patch clojure with ">="  (and send it upstream, etc. etc.)

  b) We make strip-nondetermism subtract 1 second from the .clj files'
 target modification times so it matches with the existing ">".

My preference is for "a)", naturally...

> Hey having fun with a Java puzzle and not telling the Java Team? That's
> mean ;)

I was slightly scared we had broken Java performance throughout
Debian! *g*


Best wishes,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-02 Thread Rob Browning
Chris Lamb  writes:

> Chris Lamb wrote:
>
>> > I noticed that Debian's clojure-1.8.0.jar had terrible performance as
>> > compared to both the upstream jar
>> 
>> Oh boy, this sounds fun!
>
> There's no obvious reason at this point why this performance regression is
> limited to Clojure, unless — hopefully — it's related to the .clj files?
>
> ie. this could be affecting the performance of all Java applications
> in Debian (!)

I wondered if Clojure might be trying to be clever there, and...

  
https://stackoverflow.com/questions/19594360/preserving-timestamps-on-clojure-clj-files-when-building-shaded-jar-via-maven-s

So maybe if you ensure the class files are newer than the .clj files?

Thanks for the help
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-02 Thread Emmanuel Bourg
On Mon, 02 Oct 2017 12:00:36 +0100 Chris Lamb  wrote:

> There's no obvious reason at this point why this performance regression is
> limited to Clojure, unless — hopefully — it's related to the .clj files?
> 
> ie. this could be affecting the performance of all Java applications
> in Debian (!)

Hey having fun with a Java puzzle and not telling the Java Team? That's
mean ;)

I quickly investigated this, it looks like the .clj files bundled in
clojure.jar are recompiled every time clojure is invoked if the jar was
processed by strip-nondeterminism. My guess was that the .clj files are
recompiled if the associated .class file is older, but it also happens
if they have the same date. I eventually found this check performed in
the load() method of RT.java:

  if((classURL != null &&
  (cljURL == null
 || lastModified(classURL, classfile) > lastModified(cljURL,
scriptfile)))

Changing '>' with '>=' fixes the issue.

Emmanuel Bourg



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-02 Thread Chris Lamb
Chris Lamb wrote:

> > I noticed that Debian's clojure-1.8.0.jar had terrible performance as
> > compared to both the upstream jar
> 
> Oh boy, this sounds fun!

There's no obvious reason at this point why this performance regression is
limited to Clojure, unless — hopefully — it's related to the .clj files?

ie. this could be affecting the performance of all Java applications
in Debian (!)


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-02 Thread Chris Lamb
tags 877418 + confirmed
thanks


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-02 Thread Chris Lamb
tags 877418 + confirmed
thanks

Hi Rob,

> I noticed that Debian's clojure-1.8.0.jar had terrible performance as
> compared to both the upstream jar

Oh boy, this sounds fun!

I can confirm it is due to strip-nondeterminism. In particular, the
part that sets the last modified date of the .jar contents (!):

  --- a/lib/File/StripNondeterminism/handlers/zip.pm
  +++ b/lib/File/StripNondeterminism/handlers/zip.pm
  @@ -198,8 +198,6 @@ sub normalize {
$zip->addMember($member);
$options{member_normalizer}->($member)
  if exists $options{member_normalizer};
  - $member->setLastModFileDateTimeFromUnix(
  - $File::StripNondeterminism::canonical_time // 
SAFE_EPOCH);
if ($member->fileAttributeFormat() == FA_UNIX) {
$member->unixFileAttributes(
($member->unixFileAttributes() & oct(100))

Applying this hunk removes the observed performance regression entirely,
despite it altering the .jar (different sha1sum, etc.).

What might be a useful/relevant detail here is that if I apply the following
diff, *clamping* the time rather than always setting it:

  --- a/lib/File/StripNondeterminism/handlers/zip.pm
  +++ b/lib/File/StripNondeterminism/handlers/zip.pm
  @@ -198,8 +198,9 @@ sub normalize {
$zip->addMember($member);
$options{member_normalizer}->($member)
  if exists $options{member_normalizer};
  - $member->setLastModFileDateTimeFromUnix(
  - $File::StripNondeterminism::canonical_time // 
SAFE_EPOCH);
  + my $canonical_time = $File::StripNondeterminism::canonical_time 
// SAFE_EPOCH;
  + $member->setLastModFileDateTimeFromUnix($canonical_time)
  +   if $member->lastModTime() > $canonical_time;
if ($member->fileAttributeFormat() == FA_UNIX) {
$member->unixFileAttributes(
($member->unixFileAttributes() & oct(100))

… I get about a 25% performance regression:

 1.23s user 0.06s system 191% cpu 0.673 total
 2.08s user 0.09s system 231% cpu 0.940 total

Also, setting $canonical_time far in the future results in zero
performance regression again.

This makes no sense whatsoever unless, perhaps, Java is ignoring .class
files at runtime based on their modification date compared to the current
time...?


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#877418: dh-strip-nondeterminism: kills clojure performance

2017-10-01 Thread Rob Browning

Package: dh-strip-nondeterminism
Version: 0.034-1

I noticed that Debian's clojure-1.8.0.jar had terrible performance as
compared to both the upstream jar and one built manually via the "mvn
package" or ant process, and after some investigation, I think I've
tracked it down to dh-strip-nondeterminism.

Given the current clojure 1.8.0-2 source tree, adding this to
debian/rules:

  # Ask clojure to do nothing
  define timeclj
time java -cp debian/libclojure-java/usr/share/java/clojure-1.8.0.jar \
  clojure.main -e ''
  endef

  override_dh_strip_nondeterminism:
  $(timeclj)
  dh_strip_nondeterminism
  $(timeclj)

and then running "fakeroot debian/rules binary" produces this:

  time java -cp debian/libclojure-java/usr/share/java/clojure-1.8.0.jar 
clojure.main -e ''

  real0m0.919s
  user0m1.739s
  sys 0m0.064s
  dh_strip_nondeterminism
  time java -cp debian/libclojure-java/usr/share/java/clojure-1.8.0.jar 
clojure.main -e ''

  real0m4.064s
  user0m12.204s
  sys 0m0.140s

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4