Excellent. I'm about to leave for the MAA-AMS Joint Meetings, so I won't get a chance to bench your K10 stuff on a Dunnington until I get back. I'll report on what speeds I get.
--jason On Sun, Jan 4, 2009 at 5:44 PM, <ja...@njkfrudils.plus.com> wrote: > > On Sunday 04 January 2009 01:57:05 Jason Martin wrote: >> Fair enough. I don't have any strong opinions on the matter. I >> should probably update the Intel code to include your popcount routine >> since the new Intel cores support the SSE 4.1 instruction for popcount >> on the xmm registers. >> >> --jason > > It should run no problem , although at what speed!!! > > I'm going to put the k10 specific stuff in below > > x86_64/ > x86_64/core2/ > x86_64/amd64/ > x86_64/amd64/k10/ > > >> >> On Sat, Jan 3, 2009 at 8:02 PM, <ja...@njkfrudils.plus.com> wrote: >> > On Sunday 04 January 2009 00:57:44 ja...@njkfrudils.plus.com wrote: >> >> On Sunday 04 January 2009 00:36:46 Jason Martin wrote: >> >> > Alternatively, we could stop trying to identify chips by marketing >> >> > brands and just use the values returned by CPUID. This would create a >> >> > lot of duplicated code in sub-directories, but disk space is cheap. >> >> > So, would something like: >> >> > >> >> > mpn/x86_64/<vendor>/<extended family number/model> >> >> > >> >> > work for our configuration? >> >> >> >> As most of the models are the same , this seems like a waste. >> >> Also this assumes that CPUID is the only differentiator , what about >> >> L2-cachesize (in the future?) , GPU coprocessors >> >> >> >> How about , for each asm file a description of minimum requirements >> >> eg >> >> add_n.asm requires x86_64,LAHF >> >> lshift.adm requires x86_64,SSE4.2 >> >> hamdist.asm requires x86_64,popcnt >> >> >> >> and we only bother with the differences that we use , ie virtualization >> >> instructions we dont bother with. This doesn't help with selecting among >> >> functions that run at different speeds. >> >> >> >> I think what happens at the moment is nearly the best. >> > >> > Whoops , didn't finish .... >> > >> > When we get a function which splits an existing type into two(or more) >> > subtypes then we duplicate the existing functions between the subtypes , >> > and put the new function_1 into subtype1 and new function_2 into subtype2 >> > >> >> > Jason Worth Martin >> >> > Asst. Professor of Mathematics >> >> > http://www.math.jmu.edu/~martin >> >> > >> >> > On Sat, Jan 3, 2009 at 5:49 PM, Bill Hart >> >> > <goodwillh...@googlemail.com> >> >> >> >> wrote: >> >> > > I think that features such as SSE should be tested for after testing >> >> > > for the main chip core. So under /mpn/x86_64/k8 you'd have >> >> > > directories for any features not available on all k8's. >> >> > > >> >> > > Bill. >> >> > > >> >> > > 2009/1/3 mabshoff <michael.absh...@mathematik.uni-dortmund.de>: >> >> > >> On Jan 3, 2:25 pm, jason <ja...@njkfrudils.plus.com> wrote: >> >> > >>> On Jan 3, 9:00 am, "Bill Hart" <goodwillh...@googlemail.com> >> >> > >>> wrote: >> >> > >> >> >> > >> Hi, >> >> > >> >> >> > >>> > The new intel machines. And I don't know if all Dunnington's use >> >> > >>> > the same family/system CPUID etc. So there might be mutiple >> >> > >>> > CPUID's we need to add to config.guess. >> >> > >>> > >> >> > >>> > Bill. >> >> > >>> >> >> > >>> We should change the lowest common denominator on a x86_64 system >> >> > >>> to something more useful than 486 , say P4 64bit without LAHF ? , >> >> > >>> then people can at least get mpir working on new machines without >> >> > >>> mucking about >> >> > >> >> >> > >> Well, the trouble was that configure believed it was a 32 bit >> >> > >> system, so I don't see much we can do there aside from attempting >> >> > >> to compile things in 64 bit mode. >> >> > >> >> >> > >>> For the K10 , we will need a separate directory for it , I have >> >> > >>> mpn_popcount and mpn_hamdist which will not run on the K8 , >> >> > >>> requires SSE4.1a or whatever it's called ... >> >> > >>> before 7/7.75 c/l now 1.5/1.75 c/l >> >> > >> >> >> > >> Wouldn't it be better to create a SSE4.1a directory and use that >> >> > >> assembly code when SSE 4.1a is available? That seems to be the >> >> > >> prevailing way to do things. >> >> > >> >> >> > >> On second though: according to http://en.wikipedia.org/wiki/SSE4 it >> >> > >> seems that there are three SSE4 flavors: >> >> > >> >> >> > >> * SSE 4.1 >> >> > >> * SSE 4.2 >> >> > >> * SSE 4.1a >> >> > >> >> >> > >> The last one seems to be K10 specific for now, but I would still >> >> > >> recommend to test for SSE 4.1a if your code is that specific. >> >> > >> >> >> > >> <SNIP> >> >> > >> >> >> > >> Cheers, >> >> > >> >> >> > >> Michael >> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---