Re: [petsc-dev] [petsc-users] compiler related error (configuring Petsc)

2023-08-01 Thread Barry Smith via petsc-dev


  Only four more years of this nonsense!

  One of the (now ancient) selling points of Unix was that it could be more 
nimble and evolve more rapidly than IBM's mainframe operating systems.

  Jacob,

Can't we have configure tell users explicitly the situation when they 
encounter this case and how to resolve it instead of needing constant email 
chatter for each individual Redhat user?




> On Aug 1, 2023, at 2:42 PM, Satish Balay via petsc-users 
>  wrote:
> 
>> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
> 
> Is it possible for you to use a newer version GNU compilers?
> 
> If not - your alternative is to build PETSc with --with-cxx=0 option
> 
> But then - you can't use --download-superlu_dist or any pkgs that need
> c++ [you could try building them separately though]
> 
> Satish
> 
> 
> On Tue, 1 Aug 2023, maitri ksh wrote:
> 
>> I am trying to compile petsc on a cluster ( x86_64-redhat-linux, '
>> *configure.log'*  is attached herewith) . Initially I got an error related
>> to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and
>> 'CXXFLAGS' and could surpass the non-compliant error related to c++ compiler
>> but now it gives me another error 'cannot find a C preprocessor'. How to
>> fix this?
>> 
> 



Re: [petsc-dev] fixing bugs in resolution of TSEvent's

2023-07-18 Thread Barry Smith via petsc-dev

   We are aware of "issues" with TSEvent and appreciate you working on it.  
Another user reported problems last week 
https://gitlab.com/petsc/petsc/-/issues/1414 which we resolved in an associated 
MR, you might want to check those changes to see how/if they would interact 
with your MR. As that MR indicates we are far from confident that current code 
is correct.

  Barry


> On Jul 17, 2023, at 8:22 AM, Ilya Fursov  wrote:
> 
> Dear to whom is may concern,
> 
> A while ago I tried to use TS events in my time stepping code, and I found 
> erroneous behaviour of the event resolution algorithm. For that particular 
> example it resulted in long series of extremely small time steps. After 
> examining the TSEvent handler code, I found some places with obvious errors, 
> but in general it was difficult to figure out exactly what was going on, and 
> whether there were other errors.
> 
> I decided to try and fix that stuff myself, and my choice was to refactor the 
> code from scratch. I abandoned the current (in my opinion, obscure) code 
> design of TSEventHandler, and wrote my own version, however using some ideas 
> from the original code. The refactoring mostly affected the algorithmic part 
> in tsevent.c source file. The original constructors/destructors, 
> setters/getters have almost not changed. Also, some tiny changes occurred in 
> ts.c and tsadapt.c, where interaction with events took place.
> 
> I've made a number of tests, ensuring the new code runs smoothly. These tests 
> also revealed the current/old petsc code results in lots of errors. I didn't 
> take notes on the exact numbers, but from my impression, the old code failed 
> in 70%-90% of my tests. 
> 
> So, I would like to contribute to the project what I've done. Well, I've 
> never done such thing before, I hope the process goes smoothly.
> I've read the instructions on submitting the code, and one thing I wanted to 
> ask -
> Do I need to start a feature branch from "release" integration branch?
> On one hand, I only did bug fixing. On the other hand, I've replaced one 
> public API function (plus a runtime option), albeit a very small one, which 
> didn't work well with the logic of the new code.
> 
> Please advise if there might be any other things to consider while submitting 
> the code.
> 
> Best Regards,
> Ilya



Re: [petsc-dev] PETSc future starting as a new design layer that runs on top of PETSc 3?

2022-07-31 Thread Barry Smith via petsc-dev



> On Jul 31, 2022, at 12:49 PM, Jacob Faibussowitsch  
> wrote:
> 
> Sorry, hit send too early :)
> 
>> It may be worth it if there are significant benefits in safety and static 
>> analysis or if the tooling means contributors really have fewer moving parts.
> 
> Well for one, C ships with quite possibly the leanest standard library out 
> there. A good chunk of the contents of _p_PetscObject exist because we needed 
> to reimplement (from scratch!) some extremely basic functionality. A few 
> examples off the top of my head:
> 
> 1. Reference counting. Why write it yourself when you can just use 
> std::shared_ptr.
> 2. The various dynamic arrays (e.g. composed scalars). Why write it yourself 
> when you can just use std::vector.
> 3. The various type-names. Why manage these yourself when you can just use 
> std::string.

   If the use of these things appear in the PETSc API (won't they?, you could 
use them only internally but that seems silly for anyone using PETSc in C++) 
then how do we trivially map the API to other languages? 

> 4. Inheritance. No explanation needed.
> 
> OK C++ is obviously not the only language that solves these problems but my 
> point is C’s (lack of) standard library extremely limiting. Not to mention 
> the cause of a huge number of newby bugs, because you have to reinvent the 
> wheel on data structures that have been around since the stone ages.
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> 
>> On Jul 31, 2022, at 12:31, Jacob Faibussowitsch  wrote:
>> 
>> Responding to 2 things here:
>> 
>>> Jacob, would you consider VTK to be "Modern C++"? It was designed in the 
>>> 90s and I think C++11 isn't widely used (architecturally) since it was 
>>> first allowed a few years ago.
>> 
>> 
>> I don’t know, I have personally never interacted or read their codebase. 
>> Certainly any library that is pre-C++11 will have old C++ cruft. To give a 
>> counter-example, I would consider Kokkos or Thrust to be relatively modern 
>> C++ libraries.
>> 
>>> Does clang work with a high enough level of abstraction in its 
>>> representation of C++ to map directly to Python classes, for example.
>> 
>> Sure, we can walk the Clang AST. But then we are in the business of writing 
>> a domain-specific language, and are firmly tied to a compiler. From my time 
>> writing the clang linter I am personally very comfortable with libclang but 
>> I can tell you that it:
>> 
>> A. Takes a while to get up to speed. AST closely align with the source but 
>> are not overly “friendly". As an example, try walking the AST backwards (to 
>> for example find wherever a variable is written to). You’ll find this is a 
>> monumental undertaking.
>> B. Many small idiosyncrasies to learn that are somewhat sparsely documented. 
>> There are also no real “examples” to copy/learn from.
>> 
>>> Does Python have any useful high-level representation that could be used so 
>>> that we write in Python and generate from its representation?
>> 
>> Yes, Python exposes the “ast” module for direct introspection of Python code 
>> https://docs.python.org/3/library/ast.html. You can then use this to 
>> generate arbitrary code. Team green uses this in their WARP library 
>> (https://github.com/NVIDIA/warp) to generate CUDA kernels from python src. 
>> But doing so has enormous pitfalls, mostly to do with optimization. Python 
>> famously optimizes almost nothing because you can never be sure that an 
>> expression doesn’t have a side-effect.
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> 
>>> On Jul 31, 2022, at 12:09, Barry Smith  wrote:
>>> 
>>> 
 My issue with C++ is not the language itself, but the lack of discipline 
 of C++ developers. There are disastrous stories we all know well. But 
 there are successful ones, like VTK/ParaView.
>>> 
>>> I fear that it would be difficult to learn and maintain discipline in PETSc 
>>> C++ developments. We are largely self-taught, want functionality quickly, 
>>> and the google approach to learning C++ and implementing PETSc will be a 
>>> disaster. We would need to have a starting mechanism that prevents the 
>>> monkeys with machine guns.
>>> 
>>> To do multiple language mappings properly, I think we need to start with a 
>>> language with a powerful, high-level useful AST (or some similar 
>>> representation) that automated tools can scarf through to generate language 
>>> bindings and verify the code is properly written from day one. Rather than 
>>> picking the language based on its syntax, flexibility etc, we should pick 
>>> it based on this property. Does clang work with a high enough level of 
>>> abstraction in its representation of C++ to map directly to Python classes, 
>>> for example. Does Python have any useful high-level representation that 
>>> could be used so that we write in Python and generate from its 
>>> representation? Rust? Zig? Carbon? Fortran 2035?
>>> 
>>> 
>>> 
 On 

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-23 Thread Barry Smith via petsc-dev


> On Jan 23, 2022, at 10:47 PM, Jacob Faibussowitsch  
> wrote:
> 
>> The outer LogEventBegin/End captures the entire time, including copies, 
>> kernel launches etc.
> 
> Not if the GPU call is asynchronous. To time the call the stream must also be 
> synchronized with the host. The only way to truly time only the kernel calls 
> themselves is to wrap the actual call itself:
> 
> ```
> cublasXaxpy_petsc(…)
> {
>   PetscLogGpuTimeBegin();
>   cublasXaxpy(…);
>   PetscLogGpuTimeEnd();
> }
> ```

  Indeed, they are wrapped as above.

> 
> Note that
> 
> ```
> #define cublasXaxpy_petsc(…) 
> PetscLogGpuTimeBegin();cublasXaxpy(…);PetscLogGpuTimeEnd();
> ```
> 
> Is not sufficient, as this would still include transfers if those transfers 
> happen as direct arguments to the function:
> 
> ```
> cublasXaxpy_petsc(RAII_xfer_to_device(),…);


  I am not sure what you mean here? RAII_xfer_to_device()?  Do you mean unified 
memory transfers down? I don't think we use those.

  The PetscLogGpuTimeBegin()/End was written by Hong so it works with events to 
get a GPU timing, it is not suppose to include the CPU kernel launch times or 
the time to move the scalar arguments to the GPU. It may not be perfect but it 
is the best we can do to capture the time the GPU is actively doing the 
numerics, which is what we want.


> ```
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> 
>> On Jan 23, 2022, at 21:37, Barry Smith > > wrote:
>> 
>> 
>> 
>>> On Jan 23, 2022, at 10:01 PM, Junchao Zhang >> > wrote:
>>> 
>>> 
>>> 
>>> On Sat, Jan 22, 2022 at 9:00 PM Junchao Zhang >> > wrote:
>>> 
>>> 
>>> 
>>> On Sat, Jan 22, 2022 at 5:00 PM Barry Smith >> > wrote:
>>> 
>>>   The GPU flop rate (when 100 percent flops on the GPU) should always be 
>>> higher than the overall flop rate (the previous column). For large problems 
>>> they should be similar, for small problems the GPU one may be much higher.
>>> 
>>>   If the CPU one is higher (when 100 percent flops on the GPU) something 
>>> must be wrong with the logging. I looked at the code for the two cases and 
>>> didn't see anything obvious.
>>> 
>>>   Junchao and Jacob,
>>>   I think some of the timing code in the Kokkos interface is wrong. 
>>> 
>>> *  The PetscLogGpuTimeBegin/End should be inside the viewer access code 
>>> not outside it. (The GPU time is an attempt to best time the kernels, not 
>>> other processing around the use of the kernels, that other stuff is 
>>> captured in the general LogEventBegin/End.
>>> What about potential host to device memory copy before calling a kernel?  
>>> Should we count it in the kernel time?
>> 
>>   Nope, absolutely not. The GPU time represents the time the GPU is doing 
>> active work. The outer LogEventBegin/End captures the entire time, including 
>> copies, kernel launches etc. No reason to put the copy time in the GPU time 
>> because then there would be no need for the GPU since it would be the 
>> LogEventBegin/End. The LogEventBegin/End minus the GPU time represents any 
>> overhead from transfers.
>> 
>> 
>>> 
>>> Good point 
>>> *  The use of WaitForKokkos() is confusing and seems inconsistent. 
>>> I need to have a look. Until now, I have not paid much attention to kokkos 
>>> profiling.
>>>  -For example it is used in VecTDot_SeqKokkos() which I would 
>>> think has a barrier anyways because it puts a scalar result into update? 
>>>  -Plus PetscLogGpuTimeBegin/End is suppose to already have 
>>> suitable system (that Hong added) to ensure the kernel is complete; reading 
>>> the manual page and looking at Jacobs cupmcontext.hpp it seems to be there 
>>> so I don't think WaitForKokkos() is needed in most places (or is Kokkos 
>>> asynchronous and needs this for correctness?) 
>>> But these won't explain the strange result of overall flop rate being 
>>> higher than GPU flop rate.
>>> 
>>>   Barry
>>> 
>>> 
>>> 
>>> 
>>> 
 On Jan 22, 2022, at 11:44 AM, Mark Adams >>> > wrote:
 
 I am getting some funny timings and I'm trying to figure it out.  
 I figure the gPU flop rates are bit higher because the timers are inside 
 of the CPU timers, but some are a lot bigger or inverted 
 
 --- Event Stage 2: KSP Solve only
 
 MatMult  400 1.0 1.0094e+01 1.2 1.07e+11 1.0 3.7e+05 6.1e+04 
 0.0e+00  2 55 62 54  0  68 91100100  0 671849   857147  0 0.00e+00
 0 0.00e+00 100
 MatView2 1.0 4.5257e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 
 2.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000 
 0.00e+00  0
 KSPSolve   2 1.0 1.4591e+01 1.1 1.18e+11 1.0 3.7e+05 6.1e+04 
 1.2e+03  2 60 62 54 60 100100100100100 512399   804048  0 0.00e+00
 0 0.00e+00 100
 SFPack   400 1.0 2.4545e-03 

[petsc-dev] Contest models highlight inherent inefficiencies of scientific funding competitions

2020-02-09 Thread Barry Smith via petsc-dev


https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.365


Sent from my iPad