Bill Hart wrote:
> This sage ticket 6453 is too hard. It actually looks like a "bug" in MPIR to 
> me.

Hi,

I copied this to sage-devel too.

> The only way to test David Kirkby's fix on T2 it is to do sage -sh and
> then run the spkg-install, etc.

> That means I'd need my own version of sage built on that machine,
> otherwise the spkg-install fails due to permissions problems. 


> From
> what I recall, Sage takes many hours to build on that machine. Well
> that is if it did actually build in the first place with that
> compiler.

I've not managed to build Sage completely on either t2 or my home 
machine. Both fail at the same point though

"sage: There was an error installing modified sage library code."


The build speed problem is due to ATLAS. Here is the list of packages 
currently built on t2 (it's building now). Look at the times between them

kir...@t2:[/tmp/kirkby/sage-4.1/spkg/installed] $ ls -lrt
total 640
-rw-r--r--   1 kirkby   1093           0 Jul 16 20:45 dir-0.1
-rw-r--r--   1 kirkby   1093           0 Jul 16 20:45 prereq-0.3
-rw-r--r--   1 kirkby   1093           0 Jul 16 20:45 bzip2-1.0.5
-rw-r--r--   1 kirkby   1093         221 Jul 16 20:45 sage_scripts-4.1
-rw-r--r--   1 kirkby   1093         227 Jul 16 20:45 conway_polynomials-0.2
-rw-r--r--   1 kirkby   1093         216 Jul 16 20:58 mpir-1.2.p4
-rw-r--r--   1 kirkby   1093         221 Jul 16 20:59 termcap-1.3.1.p0
-rw-r--r--   1 kirkby   1093         220 Jul 16 21:00 readline-5.2.p7
-rw-r--r--   1 kirkby   1093         218 Jul 16 21:12 pari-2.3.3.p1
-rw-r--r--   1 kirkby   1093         217 Jul 16 21:30 ntl-5.4.2.p9
-rw-r--r--   1 kirkby   1093         222 Jul 16 22:04 eclib-20080310.p7
-rw-r--r--   1 kirkby   1093         220 Jul 16 22:04 graphs-20070722
-rw-r--r--   1 kirkby   1093         224 Jul 16 22:04 elliptic_curves-0.1
-rw-r--r--   1 kirkby   1093         216 Jul 16 22:04 extcode-4.1
-rw-r--r--   1 kirkby   1093         219 Jul 16 22:06 flint-1.3.0.p2
-rw-r--r--   1 kirkby   1093         218 Jul 16 22:06 zlib-1.2.3.p4
-rw-r--r--   1 kirkby   1093         220 Jul 16 22:12 sqlite-3.5.3.p4
-rw-r--r--   1 kirkby   1093         224 Jul 16 22:14 libgpg_error-1.6.p1
-rw-r--r--   1 kirkby   1093         223 Jul 16 22:21 libgcrypt-1.4.3.p1
-rw-r--r--   1 kirkby   1093         218 Jul 16 22:24 opencdk-0.6.6
-rw-r--r--   1 kirkby   1093         220 Jul 16 22:37 gnutls-2.2.1.p2
-rw-r--r--   1 kirkby   1093         218 Jul 16 22:38 libpng-1.2.35
-rw-r--r--   1 kirkby   1093         220 Jul 16 22:58 python-2.6.2.p1
-rw-r--r--   1 kirkby   1093         222 Jul 16 23:04 freetype-2.3.5.p1
-rw-r--r--   1 kirkby   1093         217 Jul 16 23:07 gd-2.0.35.p2
-rw-r--r--   1 kirkby   1093         221 Jul 16 23:07 gdmodule-0.56.p6
-rw-r--r--   1 kirkby   1093         224 Jul 16 23:07 fortran-20071120.p5
-rw-r--r--   1 kirkby   1093         223 Jul 16 23:15 lapack-20071123.p0
-rw-r--r--   1 kirkby   1093         219 Jul 17 07:43 atlas-3.8.3.p5
-rw-r--r--   1 kirkby   1093         216 Jul 17 08:14 gsl-1.10.p1
-rw-r--r--   1 kirkby   1093         218 Jul 17 08:16 iml-1.0.1.p11
-rw-r--r--   1 kirkby   1093         221 Jul 17 08:16 ipython-0.9.1.p0
-rw-r--r--   1 kirkby   1093         221 Jul 17 08:23 givaro-3.2.13rc2
-rw-r--r--   1 kirkby   1093         220 Jul 17 08:52 linbox-1.1.6.p0
-rw-r--r--   1 kirkby   1093         220 Jul 17 08:53 f2c-20070816.p1
-rw-r--r--   1 kirkby   1093         218 Jul 17 08:53 blas-20070724
-rw-r--r--   1 kirkby   1093         219 Jul 17 09:02 numpy-1.3.0.p0
-rw-r--r--   1 kirkby   1093         238 Jul 17 09:09 
matplotlib-0.98.5.3rc0-svn6910.p4
-rw-r--r--   1 kirkby   1093         223 Jul 17 09:09 mercurial-1.1.2.p0
-rw-r--r--   1 kirkby   1093         217 Jul 17 09:25 mpfr-2.4.1p0
-rw-r--r--   1 kirkby   1093         230 Jul 17 09:27 
mpfi-1.3.4-cvs20071125.p7
-rw-r--r--   1 kirkby   1093         219 Jul 17 09:27 pexpect-2.0.p4
-rw-r--r--   1 kirkby   1093         222 Jul 17 09:28 pycrypto-2.0.1.p4

What you may notice is a huge time between lapack-20071123.p0 @ 2315 and 
atlas-3.8.3.p5 @ 0743 the next day. In other words ATLAS is taking 8.5 
hours to build. It is the 'tuning' process which is taking 90% of that 
time.

That could be reduced dramatically if we could save the temporary files 
it builds, produce a set of tuning values for the sun4v architecture, 
then ATLAS would not need to be tuned every time. But despite trying, 
and asking for some help, I can't stop Sage deleting the intermediate 
files. See my thread:

"Can I keep build data once a package is  installed ok?"

There is a bit of code in $SAGE_ROOT/local/bin/sage-spkg


DELETE_TMP=1

changing that to zero, which is what William suggested to do, causes a 
syntax error. I can't work out what's happening. I suspect fixing this 
should be a high priority.

I'll create a trac ticket for this.

> I'm not the right person to be reviewing this.
> 
> I'm also completely unsure whether the patch is valid or not. David
> has done things like turn testing on which I've no idea is the right
> thing to do or not, from a Sage point of view. That should be bundled
> as a separate patch in my view.

I did *not* turn testing on. Perhaps I should have made that clearer.

The last few lines of spkg-install in mpfr-2.4.1.spkg were:

#uncomment the line below to avoid running the test suite.
#this should be done in all final releases
#exit 0
cd ..; ./spkg-check

In other words, despite the comment, the testing was on. I changed only 
the comments there, to:

# Do NOT uncomment this line and bypass the checks, as some failures
# have been observed and this should be carefully tested.
cd ..; ./spkg-check

(I just realised, I should have said do not comment the line and bypass 
the checks. )

Since this was 4.1, I assume that is a 'release'. Either way, the 
testing was on.

My personal opinion is that given

* Sage takes many hours to almost build
* Testing take about 10 minutes,
* Failures have been seen in mpfr

I feel it's best to leave mpfr testing on permanently.

> Besides that the issue seems to be a compiler bug, which occurs from
> gcc 4.3.0 onwards which makes MPN_ZERO miscompile when supplied with a
> zero operand. But MPN_ZERO is supposed to work with a zero operand, so
> this is actually a bug in MPIR. Well it would be if it weren't a
> compiler bug.

The gcc guys don't think it is. They appear to think it is a bug in 
Solaris, as they think the assembly code is correct.

> David Kirkby claims the problem is due to memset being implemented
> differently on that architecture with that version of Sun. But I don't
> think MPN_ZERO actually uses memset. It is way too slow. We have a
> macro which gets expanded for MPN_ZERO.

OK.

> Basically on T2 we should be patching MPIR so that it uses code which
> compiles correctly on T2, otherwise it is a bug in MPIR which may
> cause an infinite number of bugs in Sage. (MPN_ZERO is used a *LOT*).
> For this reason, I'm declaring the fix incorrect. I hate doing this,
> because David has put a huge amount of effort into this one.

OK, no problem. I'll just use it as a private one, if I need to use 
another version of gcc.

> Note there is no native mpn_store on T2.
> 
> The *only* correct fix for this is to write a native assembly language
> mpn_store for this architecture. We should also do the same for ia64
> while we are at it.


> Warning: this is not going to happen in the next day, or even week.
> We'll open an MPIR ticket for it. In the mean time, the *only* correct
> fix for Sage is to not compile Sage on T2 with gcc 4.3.0 or later. It
> should bail out with a warning. The compiler bug should also be
> reported by the MPIR crew to gcc, once we have verified that it is
> indeed a miscompilation. It is clear they haven't fixed this since it
> appeared in gcc 4.3.0.

It has already been reported. In fact, the URL to the report is in SPKG.txt:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40757

> You *cannot trust* Sage built on T2 with gcc 4.3.0 or later regardless
> of whether you get the test suite to pass or not. MPN_ZERO is a
> critical function used extensively in MPIR itself and many things
> which use it, e.g. it is used in the FFT.

I can understand your point of view here Bill.

One concern I have over not patching is that using gcc 4.2.4 might break 
the build of polybori. I believe it does, but I am in the process of 
doing a fresh build from scratch and verifying this. Polybori is 
seriously broken in too many ways to list, but Alexander Dreyer is 
looking at fixing that. He now has an account on t2.

There is no significant specific issues on t2 that I am aware - I'm 
observing the same on my machine at home, though I am not experiencing 
the MPFR issue.



--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to