Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-11 Thread Richard Mills
Hi Barry,

I like your suggestion and I'll give this implementation a try.  I've used
some experimental tools that interpose on memory allocation calls and then
track the accesses to give similar information, but having what you suggest
implemented in PETSc would be easier and more useful in a lot of ways.

What we really need is dynamically updated priorities for what arrays get
placed in the high-bandwidth memory.  This sort of tracking might enable a
reasonable way to estimate these priorities.  (This only tells us about
PETSc's memory and doesn't solve the global problem, but it's a start.)

I have to think about it a bit more, but I still believe that using
something like move_pages(2) will preclude the use of a heap manager the
high-bandwidth memory.  Maybe we don't need one.  If we do, then, yes, I
think we can deal with the inability to move an array between the different
types of memory while keeping the same virtual address because we can just
switch the -array pointer.

I'll plan to implement the very simple (threshold-based) placement approach
and the tracking you suggest, and the evaluate whether the simple approach
seems adequate or whether it would be worthwhile to support more complex
options.

--Richard

On Wed, Jun 3, 2015 at 7:39 PM, Barry Smith bsm...@mcs.anl.gov wrote:


   Richard,

If the code does not use VecSetValues() then one could measure the
 importance of each vector by counting two numbers, the number of times
 VecGetArray() is called on the vector and the number of times
 VecGetArrayRead() is called. We don't currently measure this but you could
 add cntread and cntwrite fields to _p_Vec and have VecGetArray[Read]()
 increment them. Then in VecDestroy() just have the vector print its name
 and the cnts. It would be interesting to see how many vectors there are,
 for an example in src/ts/examples/tutorials (or subdirectory) and what the
 distributions of these cnts is.

Barry

 The reason this is unreliable for when VecSetValues() is used is that EACH
 VecSetValues() calls VecGetArray() which will result in artificially high
 write cnts when each one represents only accessing a tiny part of the
 vector.


  On Jun 3, 2015, at 9:26 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
 
To follow up on this, going back to my advise object to malloc being
 a living object as opposed to just some flags. In the case where different
 vectors may have very different importances at different times in the
 runtime of the simulation one could switch some vectors from using slow
 to faster memory when one knows the code is switching to a different phase
 where the vector importances are different.
 
   Barry
 
   Note that even if Intel cannot provide a way to switch a  memory
 address between fast and slow it doesn't really mater from the PETSc point
 of view since inside any particular PETSc vector we would could switch the
 -array pointer to a different memory location (and copy stuff over if
 needed) when changing a vector from important to unimportant or the
 opposite. (since no code outside the vector object knows what the pointer
 is).
 
 
  On Jun 3, 2015, at 9:18 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
 
  On Jun 3, 2015, at 8:55 PM, Richard Mills r...@utk.edu wrote:
 
  Ha, yes.  I'll try this out, but I do wonder what people's thoughts
 are on the best way to tag an object like a Vec or Mat for some
 particular treatment of its placement in memory.  Does doing this at the
 level of a Mat or Vec (e.g., VecSetAdvMallocCtx() ) sound appropriate?  We
 could actually make this a part of any PetscObject, but I think that's not
 necessary.
 
  No idea.
 
  Perhaps, and this is just nonsense off the top of my head, if you had
 some measure of the importance of a vector (or matrix; I would start with
 vectors for simplicity and since we have more of them) based on how often
 it's values would be accessed. So a vector that you know is only used
 once in a while gets a lower importance than one that gets used very
 often. Of course determining these vectors importances may be difficult.
 You could do it experimentally, add some code that measures how often each
 vector gets its values accessed (whatever that means)/read write and see
 if there is some distribution (do this for a nontrivial TS example) where
 some vectors are accessed often and others rarely. Now place the often
 accessed vectors in faster memory and see how much faster the code is.
 
  Barry
 
  A related note is that we are not particularly careful about
 reusing work vectors; say a code has ten different work vectors for
 different phases of the computation; now imagine a careful global
 analysis that determined it could get away with three work vectors (since
 only at most three had relevant values at any one time), now pop those
 three work vectors into faster memory where the ten previous work vectors
 could not fit. Obviously I am being extreme here to make a point that
 careful memory decisions could 

Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-04 Thread Richard Mills
On Wed, Jun 3, 2015 at 8:44 PM, Barry Smith bsm...@mcs.anl.gov wrote:


  On Jun 3, 2015, at 10:35 PM, Jed Brown j...@jedbrown.org wrote:
 
  Barry Smith bsm...@mcs.anl.gov writes:
  [...]
  A smart Congress would say redefine 'beat us' to something that matters
  and stop wasting your time on vanity.

   Two words that will can be next to each other: smart congress


Why would someone who is actually smart be in congress, when their
intelligence gives them so many other options?

--Richard


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Richard Mills
Hi Folks,

It's been a while, but I'd like to pick up this discussion of adding a
context to memory allocations again.

The immediate motivation I have is that I'd like to support use of the
memkind library (https://github.com/memkind/memkind), though adding a
context to PetscMallocN() (or making some other interface, say
PetscAdvMalloc() or whatever) could have much broader utility than simply
memkind support (which Jed doesn't like anyway, and I share some of his
concerns).  For the sake of having a concrete example, I'll discuss memkind
here.

Memkind's memkind_malloc() works like malloc() but takes a memkind_t
argument to specify some desired property of the memory being allocated.
For example,

 hugetlb_str = (char *)memkind_malloc(MEMKIND_HUGETLB, size);

returns a pointer to memory allocated using huge pages, and

hbw_preferred_str = (char *)memkind_malloc(MEMKIND_HBW_PREFERRED, size);

allocates memory from a high-bandwidth region if it's available and
elsewhere if not (specifying MEMKIND_HBW will insist on the allocation
coming from high-bandwidth memory, failing if it's not available).

It should be straightforward to add a variant of PetscMalloc() that accepts
a context: I'll call this PetscAdvMalloc(), for now, though we can come up
with a better name later.  This will allow passing on the memkind_t via
this context to the underlying memkind allocator, and we can have some
mechanism to set a default context (in the case of Memkind, this is likely
MEMKIND_DEFAULT) that gets used when plain PetscMalloc() gets called.

Of course, we'll need some way to ensure that the advanced malloc gets
used to allocated the critical data structures.  As a low-level way to
start, it may make sense to simply add a way to stash a context in Vec and
Mat objects.  Maybe have VecSetAdvMallocCtx(), and if that context gets
set, then PetscAdvMalloc() is used for the allocations associated with the
contents of that object.  It would probably be better to eventually have a
higher-level way to do this, e.g., support standard settings in the options
database that PETSc uses to construct the appropriate arguments to
underlying allocators that are supported, but I think just adding a way to
set this context directly is an appropriate first step.

Does this sound like a reasonable thing for me to prototype, or are others
thinking something very different?  Please let me know.  I'm getting more
access to early systems I can experiment on, and I'd really like to move
forward on trying things with high bandwidth memory (imperfect as our APIs
for using it are).

Best regards,
Richard


On Wed, Apr 29, 2015 at 11:10 PM, Richard Mills r...@utk.edu wrote:

 On Wed, Apr 29, 2015 at 1:28 PM, Barry Smith bsm...@mcs.anl.gov wrote:


   Forget about the issue of changing PetscMallocN() or adding a new
 interface instead, that is a minor syntax and annoyance issue:

   The question is is it worth exploring adding a context for certain
 memory allocations that would allow us to do various things to the memory
 and indicate properties of the memory? I think, though I agree with Jed
 that it could be fraught with difficulties, that is is worthwhile playing
 around with this.

   Barry


 I vote yes.  One might want to, say

 * Give hints via something like madvise() on how/when the memory might be
 accessed.
 * Specify a preferred kind of memory (and behavior if the preferred kind
 is not available, or perhaps even specify a priority on how hard to try to
 get the preferred memory kind)
 * Specify something like a preference to interleave allocation blocks
 between different kinds of memory

 I'm sure we can come up with plenty of other possibilities, some of which
 might actually be useful, many of which will be useful only for very
 contrived cases, and some that are not useful today but may become useful
 as memory systems evolve.

 --Richard



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Richard Mills r...@utk.edu writes:

 It's been a while, but I'd like to pick up this discussion of adding a
 context to memory allocations again.

Have you heard anything back about whether move_pages() will work?

 hbw_preferred_str = (char *)memkind_malloc(MEMKIND_HBW_PREFERRED, size);

How much would you prefer it?  If we stupidly ask for HBM in VecCreate_*
and MatCreate_*, then our users will see catastrophic performance drops
at magic sizes and will have support questions like I swapped these two
independent lines and my code ran 5x faster.  Then they'll hack the
source by writing

  if (moon_is_waxing()  operator_holding_tongue_in_right_cheek()) {
policy = MEMKIND_HBW_PREFERRED;
  }

eventually making all decisions based on nonlocal information, ignoring
the advice parameter.

Then they'll get smart and register their own malloc so they don't have
to hack the library.  Then they'll try to couple their application with
another that does the same thing and now they have to write a new malloc
that makes a new set of decisions in light of the fact that multiple
libraries are being coupled.

I think we can agree that this is madness.  Where do you draw the line
and say that crappy performance is just reality?

It's hard for me not to feel like the proposed system will be such a
nightmarish maintenance burden with such little benefit over a simple
size-based allocation that it would be better for everyone if it doesn't
exist.

For example, we've already established that small allocations should
generally go in DRAM because they're either cached or not prefetched and
thus limited by latency instead of bandwidth.  Large allocations that
get used a lot should go in HBM so long as they fit.  Since we can't
determine used a lot or fit from any information possibly available
in the calling scope, there's literally no useful advice we can provide
at that point.  So don't try, just set a dumb threshold (crude tuning
parameter) or implement a profile-guided allocation policy (brittle).

Or ignore all this nonsense, implement move_pages(), and we'll have PETSc
track accesses so we can balance the pages once the app gets going.

 Of course, we'll need some way to ensure that the advanced malloc 

I thought AdvMalloc was short for AdvisedMalloc.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
 Yes, Jed has already transformed himself into a cranky old conservative PETSc 
 developer

Is disinclination to spend effort on something with negative expected
value conservative?

Actually, it's almost the definition.  But if you spend time on
legitimately high-risk things, you should expect that with high
probability, they will be a failure.  Thus, it's essential to be
prepared to declare failure rather than lobbying for success (e.g.,
merging) without conclusive data.  Declaring failure in this case may be
hard without access to the hardware to be able to push all the design
corners.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Richard Mills
On Wed, Jun 3, 2015 at 6:04 PM, Jed Brown j...@jedbrown.org wrote:

 Richard Mills r...@utk.edu writes:

  It's been a while, but I'd like to pick up this discussion of adding a
  context to memory allocations again.

 Have you heard anything back about whether move_pages() will work?


move_pages() will work to move pages between MCDRAM and DRAM right now, but
it screws up memkind's partitioning of the heap (it won't be aware that the
pages have been moved).  (Which calls to mind the question I raised
somewhere back in this thread of whether we even need a heap manager for
the large allocations.)



  hbw_preferred_str = (char *)memkind_malloc(MEMKIND_HBW_PREFERRED, size);

 How much would you prefer it?  If we stupidly ask for HBM in VecCreate_*
 and MatCreate_*, then our users will see catastrophic performance drops
 at magic sizes and will have support questions like I swapped these two
 independent lines and my code ran 5x faster.  Then they'll hack the
 source by writing

   if (moon_is_waxing()  operator_holding_tongue_in_right_cheek()) {
 policy = MEMKIND_HBW_PREFERRED;
   }

 eventually making all decisions based on nonlocal information, ignoring
 the advice parameter.

 Then they'll get smart and register their own malloc so they don't have
 to hack the library.  Then they'll try to couple their application with
 another that does the same thing and now they have to write a new malloc
 that makes a new set of decisions in light of the fact that multiple
 libraries are being coupled.

 I think we can agree that this is madness.  Where do you draw the line
 and say that crappy performance is just reality?

 It's hard for me not to feel like the proposed system will be such a
 nightmarish maintenance burden with such little benefit over a simple
 size-based allocation that it would be better for everyone if it doesn't
 exist.


Jed, I'm with you in thinking that, ultimately, there actually needs to be
a way to make these kinds of decisions based on global information.  We
don't have that right now.  But if we get some smart allocator (and
migrator) that gives us, say malloc_use_oracle() to always make the good
decision, we still should have something like a PetscAdvMalloc() that
provides a context to allow us to pass advice to this smart allocator to
provide hints about how it will be accessed, whatever.

I know you don't like the memkind model, and I'm not thrilled with it
either (though it's what I've got to work with right now), but the
interface changes I'm proposing are applicable to other approaches.


 For example, we've already established that small allocations should
 generally go in DRAM because they're either cached or not prefetched and
 thus limited by latency instead of bandwidth.  Large allocations that
 get used a lot should go in HBM so long as they fit.  Since we can't
 determine used a lot or fit from any information possibly available
 in the calling scope, there's literally no useful advice we can provide
 at that point.  So don't try, just set a dumb threshold (crude tuning
 parameter) or implement a profile-guided allocation policy (brittle).


In a lot of cases, simple size-based allocation is probably the way to go.
An option to do automatic size-based placement is even in the latest
memkind sources on github now, but it will do that for the entire
application.  I'd like to be able to restrict this to only the PETSc
portion: Maybe a code that uses PETSc also needs to allocate some enormous
lookup tables that are big but have accesses that are really latency-
rather than bandwidth-sensitive.  Or, to be specific to a code I actually
know, I believe that in PFLOTRAN there are some pretty large allocations
required for auxiliary variables that don't need to go in high-bandwidth
memory, though we will want all of the large PETSc objects to go in there.



 Or ignore all this nonsense, implement move_pages(), and we'll have PETSc
 track accesses so we can balance the pages once the app gets going.

  Of course, we'll need some way to ensure that the advanced malloc

 I thought AdvMalloc was short for AdvisedMalloc.


Oh, hey, I do like Advised better.

--Richard


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Richard Mills r...@utk.edu writes:
  but it screws up memkind's partitioning of the heap (it won't be aware
  that the pages have been moved).

 Then memkind is stupid or the kernel isn't exposing the correct
 information to memkind.  Tell them to not be lazy and do it right.


 I believe that it really comes down to a problem with what the Linux kernel
 allows right now.  To do this right we need to hack the kernel.  Memkind
 is working within the constraints of what the kernel currently does.

What exactly is memkind trying to do?  Does it somehow discover the
number of compute processes running on the node and partition their
allocation from MCDRAM?  Surely not because that would be as comically
naive as Blue Gene partitioning memory at boot, but what *does* it do
about other processes?  If you spawn new processes, can they use MCDRAM?
How much?  How is memkind budgeting affected by a users' direct use of
mmap or shm_open?  When a process exits, does memkind in the remaining
processes know that more MCDRAM is available?


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 10:04 PM, Jeff Hammond jeff.scie...@gmail.com wrote:
 
 If everyone would just indent with tabs, we could just set the indent
 spacing with our editors ;-)

  Ah, heresy, kill him!


 
 On Wed, Jun 3, 2015 at 10:01 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
 On Jun 3, 2015, at 9:58 PM, Jeff Hammond jeff.scie...@gmail.com wrote:
 
 http://git.mpich.org/mpich.git/blob/HEAD:/src/mpi/init/init.c
 https://github.com/open-mpi/ompi/blob/master/ompi/mpi/c/init.c
 
  As I said, super insane :-)
 
  Barry
 
  I'm just having fun here; I do believe that 2 is the ultimate correct 
 indentation but I can always run a preprocessor to fix their code before I 
 use it :-)
 
 
 Jeff
 
 On Wed, Jun 3, 2015 at 9:43 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
 Jeff,
 
  Ahh, from this page, it is definitively clear that the Intel people have 
 their heads totally up their asses
 
 formatted source code with astyle --style=linux --indent=spaces=4 -y -S
 
 when everyone knows that any indent that is not 2 characters is totally 
 insane :-)
 
 Barry
 
 
 On Jun 3, 2015, at 9:37 PM, Jeff Hammond jeff.scie...@gmail.com wrote:
 
 but it screws up memkind's partitioning of the heap (it won't be aware
 that the pages have been moved).
 
 Then memkind is stupid or the kernel isn't exposing the correct
 information to memkind.  Tell them to not be lazy and do it right.
 
 The beauty of git/github is one can make branches to try out anything
 they want even if Jed thinks that he knows better than Intel how to
 write system software for Intel's hardware.
 
 This link is equivalent to pushing the Fork button on Github's
 memkind page: https://github.com/memkind/memkind#fork-destination-box.
 I'm sure that the memkind developers would be willing to review your
 pull request once you've implemented memkind_move_pages().
 
 Jeff
 
 --
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/
 
 
 
 
 --
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/
 
 
 
 
 -- 
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 10:28 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
  Sure but the super high-end (DOE LCF centers) focus allows (actually
  loves) the need for super brittle stuff.  
 
 Job security.
 
  It is not the bread and butter of PETSc but if they (silly ASCR) are
  willing to foot our bills to do our bread and butter by pandering to
  the super brittle high-end what's the harm in pandering (aside
  from our souls) since we are not actually doing the work :-)
 
 I have ethical objections to ruining the careers of scientists capable
 of doing important things.

  You are not, the few unethical bastards who have hitched their train to 
exascale will do their stuff regardless of whether you exist or not, meanwhile 
the vast majority who have not hitched their train to exascale benefit from the 
work you do that they can utilize to do science. The poor unfortunately souls 
who  are told you must use x, y or z because of X, Y, or Z and get screwed 
are not screwed by you, they are screwed by the snake oil salesman and all you 
can do is warn them about the snake oil salesman, which you do. Better to use a 
little bit of the exascale money to push it in a better direction then to allow 
all of that money to move things in the wrong direction; just because you can't 
FIX exascale doesn't mean it is unethical to use some of the money to move it 
slightly in a better direction.







Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jeff Hammond
 but it screws up memkind's partitioning of the heap (it won't be aware
 that the pages have been moved).

 Then memkind is stupid or the kernel isn't exposing the correct
 information to memkind.  Tell them to not be lazy and do it right.

The beauty of git/github is one can make branches to try out anything
they want even if Jed thinks that he knows better than Intel how to
write system software for Intel's hardware.

This link is equivalent to pushing the Fork button on Github's
memkind page: https://github.com/memkind/memkind#fork-destination-box.
I'm sure that the memkind developers would be willing to review your
pull request once you've implemented memkind_move_pages().

Jeff

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Jeff Hammond jeff.scie...@gmail.com writes:
 The beauty of git/github is one can make branches to try out anything
 they want even if Jed thinks that he knows better than Intel how to
 write system software for Intel's hardware.

I'm objecting to the interface.  I think that if they try to get memkind
merged into the existing libnuma project, they'll see similar
resistance.  It is essential for low-level interfaces to create
foundations that can be reliably built upon, not gushing wounds that
bleed complexity into everything built on top.

 This link is equivalent to pushing the Fork button on Github's
 memkind page: https://github.com/memkind/memkind#fork-destination-box.
 I'm sure that the memkind developers would be willing to review your
 pull request once you've implemented memkind_move_pages().

1. I cannot test it because I don't have access to the hardware.

2. I think memkind is solving the wrong problem in the wrong way.

3. According to Richard, the mature move_pages(2) interface has been
implemented.  That's what I wanted, so I'll just use that -- memkind
dependency gone.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 9:51 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
  Perhaps, and this is just nonsense off the top of my head, if you
  had some measure of the importance of a vector (or matrix; I would
  start with vectors for simplicity and since we have more of them)
  based on how often it's values would be accessed. So a vector that
  you know is only used once in a while gets a lower importance
  than one that gets used very often. Of course determining these
  vectors importances may be difficult. You could do it
  experimentally, add some code that measures how often each vector
  gets its values accessed (whatever that means)/read write and see
  if there is some distribution (do this for a nontrivial TS example)
  where some vectors are accessed often and others rarely. 
 
 This is what I termed profile-guided and it's very accurate (you have
 global space-time information), but super brittle when
 resource-constrained.

  Sure but the super high-end (DOE LCF centers) focus allows (actually loves)  
the need for super brittle stuff.  It is not the bread and butter of PETSc 
but if they (silly ASCR) are willing to foot our bills to do our bread and 
butter by pandering to the super brittle high-end what's the harm in 
pandering (aside from our souls) since we are not actually doing the work :-)


 
 Note that in case of Krylov solvers, the first vectors in the Krylov
 space are accessed far more than later vectors (e.g., the 30th vector is
 accessed once per 30 iterations versus the first vector which is
 accessed every iteration).  Simple greedy allocation is great for this
 case.
 
 It's terrible in other cases, a simple case of which is two solvers
 where the first is cheap (or solved only rarely) and the second is
 solved repeatedly at great expense.  Nested solvers are one such
 example.  But you don't know which one is more expensive except in
 retrospect, and this can even change as nonlinearities evolve.



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Richard Mills
Ha, yes.  I'll try this out, but I do wonder what people's thoughts are on
the best way to tag an object like a Vec or Mat for some particular
treatment of its placement in memory.  Does doing this at the level of a
Mat or Vec (e.g., VecSetAdvMallocCtx() ) sound appropriate?  We could
actually make this a part of any PetscObject, but I think that's not
necessary.

--Richard

On Wed, Jun 3, 2015 at 6:50 PM, Barry Smith bsm...@mcs.anl.gov wrote:


   The beauty of git/bitbucket is one can make branches to try out anything
 they want even if some cranky old conservative PETSc developer thinks it is
 worse then consorting with the devil.

As I said before I think that additional argument to advised_malloc
 should be a living object which one can change over time as opposed to just
 a flag type argument that only effects the malloc at malloc time. Of
 course the living part can be implemented later.

Barry

 Yes, Jed has already transformed himself into a cranky old conservative
 PETSc developer


  On Jun 3, 2015, at 7:33 PM, Richard Mills r...@utk.edu wrote:
 
  Hi Folks,
 
  It's been a while, but I'd like to pick up this discussion of adding a
 context to memory allocations again.
 
  The immediate motivation I have is that I'd like to support use of the
 memkind library (https://github.com/memkind/memkind), though adding a
 context to PetscMallocN() (or making some other interface, say
 PetscAdvMalloc() or whatever) could have much broader utility than simply
 memkind support (which Jed doesn't like anyway, and I share some of his
 concerns).  For the sake of having a concrete example, I'll discuss memkind
 here.
 
  Memkind's memkind_malloc() works like malloc() but takes a memkind_t
 argument to specify some desired property of the memory being allocated.
 For example,
 
   hugetlb_str = (char *)memkind_malloc(MEMKIND_HUGETLB, size);
 
  returns a pointer to memory allocated using huge pages, and
 
   hbw_preferred_str = (char *)memkind_malloc(MEMKIND_HBW_PREFERRED, size);
 
  allocates memory from a high-bandwidth region if it's available and
 elsewhere if not (specifying MEMKIND_HBW will insist on the allocation
 coming from high-bandwidth memory, failing if it's not available).
 
  It should be straightforward to add a variant of PetscMalloc() that
 accepts a context: I'll call this PetscAdvMalloc(), for now, though we can
 come up with a better name later.  This will allow passing on the memkind_t
 via this context to the underlying memkind allocator, and we can have some
 mechanism to set a default context (in the case of Memkind, this is likely
 MEMKIND_DEFAULT) that gets used when plain PetscMalloc() gets called.
 
  Of course, we'll need some way to ensure that the advanced malloc gets
 used to allocated the critical data structures.  As a low-level way to
 start, it may make sense to simply add a way to stash a context in Vec and
 Mat objects.  Maybe have VecSetAdvMallocCtx(), and if that context gets
 set, then PetscAdvMalloc() is used for the allocations associated with the
 contents of that object.  It would probably be better to eventually have a
 higher-level way to do this, e.g., support standard settings in the options
 database that PETSc uses to construct the appropriate arguments to
 underlying allocators that are supported, but I think just adding a way to
 set this context directly is an appropriate first step.
 
  Does this sound like a reasonable thing for me to prototype, or are
 others thinking something very different?  Please let me know.  I'm getting
 more access to early systems I can experiment on, and I'd really like to
 move forward on trying things with high bandwidth memory (imperfect as our
 APIs for using it are).
 
  Best regards,
  Richard
 
 
  On Wed, Apr 29, 2015 at 11:10 PM, Richard Mills r...@utk.edu wrote:
  On Wed, Apr 29, 2015 at 1:28 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
Forget about the issue of changing PetscMallocN() or adding a new
 interface instead, that is a minor syntax and annoyance issue:
 
The question is is it worth exploring adding a context for certain
 memory allocations that would allow us to do various things to the memory
 and indicate properties of the memory? I think, though I agree with Jed
 that it could be fraught with difficulties, that is is worthwhile playing
 around with this.
 
Barry
 
 
  I vote yes.  One might want to, say
 
  * Give hints via something like madvise() on how/when the memory might
 be accessed.
  * Specify a preferred kind of memory (and behavior if the preferred
 kind is not available, or perhaps even specify a priority on how hard to
 try to get the preferred memory kind)
  * Specify something like a preference to interleave allocation blocks
 between different kinds of memory
 
  I'm sure we can come up with plenty of other possibilities, some of
 which might actually be useful, many of which will be useful only for very
 contrived cases, and some that are not useful today but may 

Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Richard Mills r...@utk.edu writes:

 On Wed, Jun 3, 2015 at 6:04 PM, Jed Brown j...@jedbrown.org wrote:

 Have you heard anything back about whether move_pages() will work?


 move_pages() will work to move pages between MCDRAM and DRAM right
 now, 

Great!

 but it screws up memkind's partitioning of the heap (it won't be aware
 that the pages have been moved). 

Then memkind is stupid or the kernel isn't exposing the correct
information to memkind.  Tell them to not be lazy and do it right.

 Jed, I'm with you in thinking that, ultimately, there actually needs to be
 a way to make these kinds of decisions based on global information.  We
 don't have that right now.  But if we get some smart allocator (and
 migrator) that gives us, say malloc_use_oracle() to always make the good
 decision, 

The oracle has to see into the future.  move_pages() is so much more
powerful.

 we still should have something like a PetscAdvMalloc() that provides a
 context to allow us to pass advice to this smart allocator to provide
 hints about how it will be accessed, whatever.

What does the caller know?  What good is the context if we always pass
I_HAVE_NO_IDEA?

 In a lot of cases, simple size-based allocation is probably the way to go.
 An option to do automatic size-based placement is even in the latest
 memkind sources on github now, but it will do that for the entire
 application.  

That's crude; I'd rather have each library use its own threshold.

 I'd like to be able to restrict this to only the PETSc portion: Maybe
 a code that uses PETSc also needs to allocate some enormous lookup
 tables that are big but have accesses that are really latency- rather
 than bandwidth-sensitive.  Or, to be specific to a code I actually
 know, I believe that in PFLOTRAN there are some pretty large
 allocations required for auxiliary variables that don't need to go in
 high-bandwidth memory, though we will want all of the large PETSc
 objects to go in there.

Fine.  That involves a couple lines of code.  Go into PetscMallocAlign
and add the ability to use memkind.  Add a run-time option to control
the threshold.  Done.

If you want complexity to bleed into the library (and necessarily into
user code if given any power at all), I think you need to demonstrate a
tangible benefit that cannot be obtained by something simpler.  Consider
the simple and dumb threshold above to be the null hypothesis.

This is just my opinion.  Feel free to make a branch with whatever you
prefer.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
   Richard has access to the hardware

Is this true?  Or he will have hardware soon?

   and is not going to lie to us that oh it helps so much because
   he knows that you will test it yourself and see that he is lying. 

I'm not at all worried about him lying, but I'm concerned about being
able to sample across a sufficiently broad range of apps/configurations.
Maybe he can run some PETSc examples and PFLOTRAN, which is a good
start, but may not be running in the appropriately memory-constrained
circumstances of a package with particles like pTatin, for example.  We
care not just about the highs but also about the confusing corners that
users will undoubtedly encounter.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

  Jeff,

   Ahh, from this page, it is definitively clear that the Intel people have 
their heads totally up their asses 

formatted source code with astyle --style=linux --indent=spaces=4 -y -S

when everyone knows that any indent that is not 2 characters is totally insane 
:-)

  Barry


 On Jun 3, 2015, at 9:37 PM, Jeff Hammond jeff.scie...@gmail.com wrote:
 
 but it screws up memkind's partitioning of the heap (it won't be aware
 that the pages have been moved).
 
 Then memkind is stupid or the kernel isn't exposing the correct
 information to memkind.  Tell them to not be lazy and do it right.
 
 The beauty of git/github is one can make branches to try out anything
 they want even if Jed thinks that he knows better than Intel how to
 write system software for Intel's hardware.
 
 This link is equivalent to pushing the Fork button on Github's
 memkind page: https://github.com/memkind/memkind#fork-destination-box.
 I'm sure that the memkind developers would be willing to review your
 pull request once you've implemented memkind_move_pages().
 
 Jeff
 
 -- 
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
   Perhaps, and this is just nonsense off the top of my head, if you
   had some measure of the importance of a vector (or matrix; I would
   start with vectors for simplicity and since we have more of them)
   based on how often it's values would be accessed. So a vector that
   you know is only used once in a while gets a lower importance
   than one that gets used very often. Of course determining these
   vectors importances may be difficult. You could do it
   experimentally, add some code that measures how often each vector
   gets its values accessed (whatever that means)/read write and see
   if there is some distribution (do this for a nontrivial TS example)
   where some vectors are accessed often and others rarely. 

This is what I termed profile-guided and it's very accurate (you have
global space-time information), but super brittle when
resource-constrained.

Note that in case of Krylov solvers, the first vectors in the Krylov
space are accessed far more than later vectors (e.g., the 30th vector is
accessed once per 30 iterations versus the first vector which is
accessed every iteration).  Simple greedy allocation is great for this
case.

It's terrible in other cases, a simple case of which is two solvers
where the first is cheap (or solved only rarely) and the second is
solved repeatedly at great expense.  Nested solvers are one such
example.  But you don't know which one is more expensive except in
retrospect, and this can even change as nonlinearities evolve.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
   Even if it helps in only 30 percent of applications that is still
   a good thing (and a great thing politically). Then it becomes an
   issue of education and proper profiling tools to tell people for
   their apps that it won't work; so the other 70% is not confused.

How much does it have to help those 30% if the complexity contributes to
driving 30% of potential new users away from HPC?

I'm in favor of doing the simplest thing until presented with
overwhelming evidence that the complicated thing is necessary.  I
understand that this doesn't win grants; you have to say that the simple
thing that has been working will never work at exascale.

   Note that Marc Snir today told me that it is perfectly fine if the
   largest computing systems, i.e. the LCFs can only provide useful
   performance for a small subset of all possible applications.

Even when that small subset does not contain the primary apps used to
sell the machines to Congress.  It's just too difficult to have a
consistent story.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
   Sure but the super high-end (DOE LCF centers) focus allows (actually
   loves) the need for super brittle stuff.  

Job security.

   It is not the bread and butter of PETSc but if they (silly ASCR) are
   willing to foot our bills to do our bread and butter by pandering to
   the super brittle high-end what's the harm in pandering (aside
   from our souls) since we are not actually doing the work :-)

I have ethical objections to ruining the careers of scientists capable
of doing important things.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

  The beauty of git/bitbucket is one can make branches to try out anything they 
want even if some cranky old conservative PETSc developer thinks it is worse 
then consorting with the devil.

   As I said before I think that additional argument to advised_malloc should 
be a living object which one can change over time as opposed to just a flag 
type argument that only effects the malloc at malloc time. Of course the 
living part can be implemented later.

   Barry

Yes, Jed has already transformed himself into a cranky old conservative PETSc 
developer


 On Jun 3, 2015, at 7:33 PM, Richard Mills r...@utk.edu wrote:
 
 Hi Folks,
 
 It's been a while, but I'd like to pick up this discussion of adding a 
 context to memory allocations again.
 
 The immediate motivation I have is that I'd like to support use of the 
 memkind library (https://github.com/memkind/memkind), though adding a context 
 to PetscMallocN() (or making some other interface, say PetscAdvMalloc() or 
 whatever) could have much broader utility than simply memkind support (which 
 Jed doesn't like anyway, and I share some of his concerns).  For the sake of 
 having a concrete example, I'll discuss memkind here.
 
 Memkind's memkind_malloc() works like malloc() but takes a memkind_t argument 
 to specify some desired property of the memory being allocated.  For example, 
 
  hugetlb_str = (char *)memkind_malloc(MEMKIND_HUGETLB, size);
 
 returns a pointer to memory allocated using huge pages, and 
 
  hbw_preferred_str = (char *)memkind_malloc(MEMKIND_HBW_PREFERRED, size);
 
 allocates memory from a high-bandwidth region if it's available and elsewhere 
 if not (specifying MEMKIND_HBW will insist on the allocation coming from 
 high-bandwidth memory, failing if it's not available).
 
 It should be straightforward to add a variant of PetscMalloc() that accepts a 
 context: I'll call this PetscAdvMalloc(), for now, though we can come up with 
 a better name later.  This will allow passing on the memkind_t via this 
 context to the underlying memkind allocator, and we can have some mechanism 
 to set a default context (in the case of Memkind, this is likely 
 MEMKIND_DEFAULT) that gets used when plain PetscMalloc() gets called.
 
 Of course, we'll need some way to ensure that the advanced malloc gets used 
 to allocated the critical data structures.  As a low-level way to start, it 
 may make sense to simply add a way to stash a context in Vec and Mat objects. 
  Maybe have VecSetAdvMallocCtx(), and if that context gets set, then 
 PetscAdvMalloc() is used for the allocations associated with the contents of 
 that object.  It would probably be better to eventually have a higher-level 
 way to do this, e.g., support standard settings in the options database that 
 PETSc uses to construct the appropriate arguments to underlying allocators 
 that are supported, but I think just adding a way to set this context 
 directly is an appropriate first step.
   
 Does this sound like a reasonable thing for me to prototype, or are others 
 thinking something very different?  Please let me know.  I'm getting more 
 access to early systems I can experiment on, and I'd really like to move 
 forward on trying things with high bandwidth memory (imperfect as our APIs 
 for using it are).
 
 Best regards,
 Richard
 
 
 On Wed, Apr 29, 2015 at 11:10 PM, Richard Mills r...@utk.edu wrote:
 On Wed, Apr 29, 2015 at 1:28 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
   Forget about the issue of changing PetscMallocN() or adding a new 
 interface instead, that is a minor syntax and annoyance issue:
 
   The question is is it worth exploring adding a context for certain memory 
 allocations that would allow us to do various things to the memory and 
 indicate properties of the memory? I think, though I agree with Jed that 
 it could be fraught with difficulties, that is is worthwhile playing around 
 with this.
 
   Barry
 
 
 I vote yes.  One might want to, say
 
 * Give hints via something like madvise() on how/when the memory might be 
 accessed.
 * Specify a preferred kind of memory (and behavior if the preferred kind is 
 not available, or perhaps even specify a priority on how hard to try to get 
 the preferred memory kind)
 * Specify something like a preference to interleave allocation blocks between 
 different kinds of memory
 
 I'm sure we can come up with plenty of other possibilities, some of which 
 might actually be useful, many of which will be useful only for very 
 contrived cases, and some that are not useful today but may become useful as 
 memory systems evolve.
 
 --Richard
 



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Richard Mills
On Wed, Jun 3, 2015 at 6:54 PM, Jed Brown j...@jedbrown.org wrote:

 Richard Mills r...@utk.edu writes:

  On Wed, Jun 3, 2015 at 6:04 PM, Jed Brown j...@jedbrown.org wrote:
 
  Have you heard anything back about whether move_pages() will work?
 
 
  move_pages() will work to move pages between MCDRAM and DRAM right
  now,

 Great!

  but it screws up memkind's partitioning of the heap (it won't be aware
  that the pages have been moved).

 Then memkind is stupid or the kernel isn't exposing the correct
 information to memkind.  Tell them to not be lazy and do it right.


I believe that it really comes down to a problem with what the Linux kernel
allows right now.  To do this right we need to hack the kernel.  Memkind
is working within the constraints of what the kernel currently does.



  Jed, I'm with you in thinking that, ultimately, there actually needs to
 be
  a way to make these kinds of decisions based on global information.  We
  don't have that right now.  But if we get some smart allocator (and
  migrator) that gives us, say malloc_use_oracle() to always make the good
  decision,

 The oracle has to see into the future.  move_pages() is so much more
 powerful.

  we still should have something like a PetscAdvMalloc() that provides a
  context to allow us to pass advice to this smart allocator to provide
  hints about how it will be accessed, whatever.

 What does the caller know?  What good is the context if we always pass
 I_HAVE_NO_IDEA?

  In a lot of cases, simple size-based allocation is probably the way to
 go.
  An option to do automatic size-based placement is even in the latest
  memkind sources on github now, but it will do that for the entire
  application.

 That's crude; I'd rather have each library use its own threshold.

  I'd like to be able to restrict this to only the PETSc portion: Maybe
  a code that uses PETSc also needs to allocate some enormous lookup
  tables that are big but have accesses that are really latency- rather
  than bandwidth-sensitive.  Or, to be specific to a code I actually
  know, I believe that in PFLOTRAN there are some pretty large
  allocations required for auxiliary variables that don't need to go in
  high-bandwidth memory, though we will want all of the large PETSc
  objects to go in there.

 Fine.  That involves a couple lines of code.  Go into PetscMallocAlign
 and add the ability to use memkind.  Add a run-time option to control
 the threshold.  Done.


Hmm.  That's a simpler solution that may be better.  I'm not sure that it
will always be the best thing to do, but in cases where it is appropriate,
that simple option sounds like something we should support.

I assume you'd also like an option to specify that the allocation should
fail if high bandwidth memory cannot be allocated, to avoid seeing very
confusing performance.



 If you want complexity to bleed into the library (and necessarily into
 user code if given any power at all), I think you need to demonstrate a
 tangible benefit that cannot be obtained by something simpler.  Consider
 the simple and dumb threshold above to be the null hypothesis.

 This is just my opinion.  Feel free to make a branch with whatever you
 prefer.



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Richard Mills
On Wed, Jun 3, 2015 at 7:28 PM, Jed Brown j...@jedbrown.org wrote:

 Barry Smith bsm...@mcs.anl.gov writes:
Richard has access to the hardware

 Is this true?  Or he will have hardware soon?


Yes, I finally have access to hardware.  It's a bit hard to get time on,
and it's flaky because it is from the initial tape-in, but it's here and
I've run on it.  That's what prompted me to bring this thread up again.



and is not going to lie to us that oh it helps so much because
he knows that you will test it yourself and see that he is lying.

 I'm not at all worried about him lying, but I'm concerned about being
 able to sample across a sufficiently broad range of apps/configurations.
 Maybe he can run some PETSc examples and PFLOTRAN, which is a good
 start, but may not be running in the appropriately memory-constrained
 circumstances of a package with particles like pTatin, for example.  We
 care not just about the highs but also about the confusing corners that
 users will undoubtedly encounter.


Good point, Jed.  And I anticipate a lot of confusing corners.

--Richard


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 9:28 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
  Richard has access to the hardware
 
 Is this true?  Or he will have hardware soon?
 
  and is not going to lie to us that oh it helps so much because
  he knows that you will test it yourself and see that he is lying. 
 
 I'm not at all worried about him lying, but I'm concerned about being
 able to sample across a sufficiently broad range of apps/configurations.
 Maybe he can run some PETSc examples and PFLOTRAN, which is a good
 start, but may not be running in the appropriately memory-constrained
 circumstances of a package with particles like pTatin, for example.  We
 care not just about the highs but also about the confusing corners that
 users will undoubtedly encounter.

  Even if it helps in only 30 percent of applications that is still a good 
thing (and a great thing politically). Then it becomes an issue of education 
and proper profiling tools to tell people for their apps that it won't work; so 
the other 70% is not confused.

  Note that Marc Snir today told me that it is perfectly fine if the largest 
computing systems, i.e. the LCFs can only provide useful performance for a 
small subset of all possible applications.

  Barry







Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 9:58 PM, Jeff Hammond jeff.scie...@gmail.com wrote:
 
 http://git.mpich.org/mpich.git/blob/HEAD:/src/mpi/init/init.c
 https://github.com/open-mpi/ompi/blob/master/ompi/mpi/c/init.c

  As I said, super insane :-)

  Barry

  I'm just having fun here; I do believe that 2 is the ultimate correct 
indentation but I can always run a preprocessor to fix their code before I use 
it :-)

 
 Jeff
 
 On Wed, Jun 3, 2015 at 9:43 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
  Jeff,
 
   Ahh, from this page, it is definitively clear that the Intel people have 
 their heads totally up their asses
 
 formatted source code with astyle --style=linux --indent=spaces=4 -y -S
 
 when everyone knows that any indent that is not 2 characters is totally 
 insane :-)
 
  Barry
 
 
 On Jun 3, 2015, at 9:37 PM, Jeff Hammond jeff.scie...@gmail.com wrote:
 
 but it screws up memkind's partitioning of the heap (it won't be aware
 that the pages have been moved).
 
 Then memkind is stupid or the kernel isn't exposing the correct
 information to memkind.  Tell them to not be lazy and do it right.
 
 The beauty of git/github is one can make branches to try out anything
 they want even if Jed thinks that he knows better than Intel how to
 write system software for Intel's hardware.
 
 This link is equivalent to pushing the Fork button on Github's
 memkind page: https://github.com/memkind/memkind#fork-destination-box.
 I'm sure that the memkind developers would be willing to review your
 pull request once you've implemented memkind_move_pages().
 
 Jeff
 
 --
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/
 
 
 
 
 -- 
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 10:08 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
  Even if it helps in only 30 percent of applications that is still
  a good thing (and a great thing politically). Then it becomes an
  issue of education and proper profiling tools to tell people for
  their apps that it won't work; so the other 70% is not confused.
 
 How much does it have to help those 30% if the complexity contributes to
 driving 30% of potential new users away from HPC?

  It is OUR job as PETSc developers to hide that complexity from the most 
people who would be driven away from HPC because of it. Thus if Richard 
proposed changing VecCreate() to VecCreate(MPI_Comm, Crazy Intel specific 
Memkind options, Vec *x); we would reject it. He is not even coming close to 
proposing that, in fact he is not proposing anything, he is just asking for 
advise on how to run some experiments to see if the Phi crazy memory shit can 
be beneficial to some PETSc apps.


 
 I'm in favor of doing the simplest thing until presented with
 overwhelming evidence that the complicated thing is necessary.

   Says the man who suggest the PetscThreadComm stuff in PETSc that was 
recently removed because it was too complicated and had too (no) benefits :-)

  I
 understand that this doesn't win grants; you have to say that the simple
 thing that has been working will never work at exascale.
 
  Note that Marc Snir today told me that it is perfectly fine if the
  largest computing systems, i.e. the LCFs can only provide useful
  performance for a small subset of all possible applications.
 
 Even when that small subset does not contain the primary apps used to
 sell the machines to Congress.  It's just too difficult to have a
 consistent story.

   The story to Congress is: China might beat us if you don't give us money, 
any other effect is third order at best.




Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Jeff Hammond jeff.scie...@gmail.com writes:

 On Wed, Jun 3, 2015 at 9:58 PM, Jed Brown j...@jedbrown.org wrote:
 Jeff Hammond jeff.scie...@gmail.com writes:
 The beauty of git/github is one can make branches to try out anything
 they want even if Jed thinks that he knows better than Intel how to
 write system software for Intel's hardware.

 I'm objecting to the interface.  I think that if they try to get memkind
 merged into the existing libnuma project, they'll see similar
 resistance.  It is essential for low-level interfaces to create
 foundations that can be reliably built upon, not gushing wounds that
 bleed complexity into everything built on top.

 Step 1: Commit a change associated with the new interface function.

This response is not germane to my point above.  I also never asked for
a new interface.

 Step 2: Commit a change implementing the new interface function.
 Step 3: File a pull request.

 This link is equivalent to pushing the Fork button on Github's
 memkind page: https://github.com/memkind/memkind#fork-destination-box.
 I'm sure that the memkind developers would be willing to review your
 pull request once you've implemented memkind_move_pages().

 1. I cannot test it because I don't have access to the hardware.

 The memkind library itself was developed entirely without access to
 the hardware to which you refer, so this complaint is not relevant.

The interesting case here is testing failure modes in the face of
resource exhaustion, which doesn't seem to have been addressed in a
serious way by memkind and requires other trickery to test without
MCDRAM.  Also, the performance effects are relevant.  But I don't want
anything else in memkind because I don't want to use memkind for
anything ever.

 2. I think memkind is solving the wrong problem in the wrong way.

 It is more correct to say it is solving a different problem than the
 one you care about.  memkind is the correct way to solve the problem
 it is trying to solve.  Please stop equating your disagreement with
 the problem statement as evidence that the solution is terrible.

This is pedantry.  Is there a clear statement of what problem memkind
solves?

  The memkind library is a user extensible heap manager built on top of
  jemalloc which enables control of memory characteristics and a
  partitioning of the heap between kinds of memory.

This is just a low-level statement about what it does and I would argue
it doesn't even do this in a useful way because it is entirely at
allocation time assuming the caller is omniscient.

 3. According to Richard, the mature move_pages(2) interface has been
 implemented.  That's what I wanted, so I'll just use that -- memkind
 dependency gone.

 Does this mean that you will stop complaining about memkind, since it
 is not directly relevant to your life?  I would like that.

Yes, as soon as people stop telling me that I should use memkind and
stop asking to put it into packages I interact with, I'll ignore it like
countless other projects that are irrelevant to what I do.  But if, like
OpenMP, the turd keeps landing in my breakfast, I'm probably going to
mention that it's impolite to keep pooping in my breakfast.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 10:35 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
 
  It is OUR job as PETSc developers to hide that complexity from the
  most people who would be driven away from HPC because of it. 
 
 Absolutely.  So now the question becomes what benefit can this have,
 predicated on not letting the complexity bleed onto the user.
 
  Thus if Richard proposed changing VecCreate() to VecCreate(MPI_Comm,
  Crazy Intel specific Memkind options, Vec *x); we would reject
  it. He is not even coming close to proposing that, in fact he is not
  proposing anything, he is just asking for advise on how to run some
  experiments to see if the Phi crazy memory shit can be beneficial to
  some PETSc apps.
 
 And my advice is to start with the simplest thing possible.
 
 I'm also expressing skepticism that a more sophisticated solution that
 _does not bleed complexity on the user_ is capable of substantially
 beating the simple thing across a meaningful range of applications.

  There you go again with meaningful range of applications. Why can't you get 
it through your head that if cosmology science advances at all from exascale 
(which I doubt it will, speaking of unethical bastards) then all of exascale is 
like totally worthwhile :-)

 
   Says the man who suggest the PetscThreadComm stuff in PETSc that
   was recently removed because it was too complicated and had too
   (no) benefits :-)
 
 Yes, I was trying to solve a problem that didn't need to be solved.  My
 mistake.
 
   The story to Congress is: China might beat us if you don't give us
   money, any other effect is third order at best.
 
 A smart Congress would say redefine 'beat us' to something that matters
 and stop wasting your time on vanity.

  Two words that will can be next to each other: smart congress




Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

 On Jun 3, 2015, at 9:00 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
 Yes, Jed has already transformed himself into a cranky old conservative 
 PETSc developer
 
 Is disinclination to spend effort on something with negative expected
 value conservative?
 
 Actually, it's almost the definition.  But if you spend time on
 legitimately high-risk things, you should expect that with high
 probability, they will be a failure.  Thus, it's essential to be
 prepared to declare failure rather than lobbying for success (e.g.,
 merging) without conclusive data.  Declaring failure in this case may be
 hard without access to the hardware to be able to push all the design
 corners.
  
  Richard has access to the hardware and is not going to lie to us that oh 
it helps so much because he knows that you will test it yourself and see that 
he is lying. So should we support some 3rd party that wants money (from ASCR) 
to prove (in a publication) that using memkind is a good idea? Absolutely not. 
But should we support Richard to try some experiments, I don't see the downside.




Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Barry Smith

   To follow up on this, going back to my advise object to malloc being a 
living object as opposed to just some flags. In the case where different 
vectors may have very different importances at different times in the runtime 
of the simulation one could switch some vectors from using slow to faster 
memory when one knows the code is switching to a different phase where the 
vector importances are different.

  Barry

  Note that even if Intel cannot provide a way to switch a  memory address 
between fast and slow it doesn't really mater from the PETSc point of view 
since inside any particular PETSc vector we would could switch the -array 
pointer to a different memory location (and copy stuff over if needed) when 
changing a vector from important to unimportant or the opposite. (since no code 
outside the vector object knows what the pointer is).


 On Jun 3, 2015, at 9:18 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
 
 On Jun 3, 2015, at 8:55 PM, Richard Mills r...@utk.edu wrote:
 
 Ha, yes.  I'll try this out, but I do wonder what people's thoughts are on 
 the best way to tag an object like a Vec or Mat for some particular 
 treatment of its placement in memory.  Does doing this at the level of a Mat 
 or Vec (e.g., VecSetAdvMallocCtx() ) sound appropriate?  We could actually 
 make this a part of any PetscObject, but I think that's not necessary.
 
  No idea.
 
  Perhaps, and this is just nonsense off the top of my head, if you had some 
 measure of the importance of a vector (or matrix; I would start with vectors 
 for simplicity and since we have more of them) based on how often it's values 
 would be accessed. So a vector that you know is only used once in a while 
 gets a lower importance than one that gets used very often. Of course 
 determining these vectors importances may be difficult. You could do it 
 experimentally, add some code that measures how often each vector gets its 
 values accessed (whatever that means)/read write and see if there is some 
 distribution (do this for a nontrivial TS example) where some vectors are 
 accessed often and others rarely. Now place the often accessed vectors in 
 faster memory and see how much faster the code is.
 
  Barry
 
 A related note is that we are not particularly careful about reusing work 
 vectors; say a code has ten different work vectors for different phases of 
 the computation; now imagine a careful global analysis that determined it 
 could get away with three work vectors (since only at most three had relevant 
 values at any one time), now pop those three work vectors into faster memory 
 where the ten previous work vectors could not fit. Obviously I am being 
 extreme here to make a point that careful memory decisions could potentially 
 make a difference in complicated codes (and all we are about are complicated 
 codes).
 
 
 
 
 
 --Richard
 
 On Wed, Jun 3, 2015 at 6:50 PM, Barry Smith bsm...@mcs.anl.gov wrote:
 
  The beauty of git/bitbucket is one can make branches to try out anything 
 they want even if some cranky old conservative PETSc developer thinks it is 
 worse then consorting with the devil.
 
   As I said before I think that additional argument to advised_malloc 
 should be a living object which one can change over time as opposed to just 
 a flag type argument that only effects the malloc at malloc time. Of 
 course the living part can be implemented later.
 
   Barry
 
 Yes, Jed has already transformed himself into a cranky old conservative 
 PETSc developer
 
 
 On Jun 3, 2015, at 7:33 PM, Richard Mills r...@utk.edu wrote:
 
 Hi Folks,
 
 It's been a while, but I'd like to pick up this discussion of adding a 
 context to memory allocations again.
 
 The immediate motivation I have is that I'd like to support use of the 
 memkind library (https://github.com/memkind/memkind), though adding a 
 context to PetscMallocN() (or making some other interface, say 
 PetscAdvMalloc() or whatever) could have much broader utility than simply 
 memkind support (which Jed doesn't like anyway, and I share some of his 
 concerns).  For the sake of having a concrete example, I'll discuss memkind 
 here.
 
 Memkind's memkind_malloc() works like malloc() but takes a memkind_t 
 argument to specify some desired property of the memory being allocated.  
 For example,
 
 hugetlb_str = (char *)memkind_malloc(MEMKIND_HUGETLB, size);
 
 returns a pointer to memory allocated using huge pages, and
 
 hbw_preferred_str = (char *)memkind_malloc(MEMKIND_HBW_PREFERRED, size);
 
 allocates memory from a high-bandwidth region if it's available and 
 elsewhere if not (specifying MEMKIND_HBW will insist on the allocation 
 coming from high-bandwidth memory, failing if it's not available).
 
 It should be straightforward to add a variant of PetscMalloc() that accepts 
 a context: I'll call this PetscAdvMalloc(), for now, though we can come up 
 with a better name later.  This will allow passing on the memkind_t via 
 this context to the underlying 

Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jeff Hammond
On Wed, Jun 3, 2015 at 9:58 PM, Jed Brown j...@jedbrown.org wrote:
 Jeff Hammond jeff.scie...@gmail.com writes:
 The beauty of git/github is one can make branches to try out anything
 they want even if Jed thinks that he knows better than Intel how to
 write system software for Intel's hardware.

 I'm objecting to the interface.  I think that if they try to get memkind
 merged into the existing libnuma project, they'll see similar
 resistance.  It is essential for low-level interfaces to create
 foundations that can be reliably built upon, not gushing wounds that
 bleed complexity into everything built on top.

Step 1: Commit a change associated with the new interface function.
Step 2: Commit a change implementing the new interface function.
Step 3: File a pull request.

 This link is equivalent to pushing the Fork button on Github's
 memkind page: https://github.com/memkind/memkind#fork-destination-box.
 I'm sure that the memkind developers would be willing to review your
 pull request once you've implemented memkind_move_pages().

 1. I cannot test it because I don't have access to the hardware.

The memkind library itself was developed entirely without access to
the hardware to which you refer, so this complaint is not relevant.

 2. I think memkind is solving the wrong problem in the wrong way.

It is more correct to say it is solving a different problem than the
one you care about.  memkind is the correct way to solve the problem
it is trying to solve.  Please stop equating your disagreement with
the problem statement as evidence that the solution is terrible.

 3. According to Richard, the mature move_pages(2) interface has been
 implemented.  That's what I wanted, so I'll just use that -- memkind
 dependency gone.

Does this mean that you will stop complaining about memkind, since it
is not directly relevant to your life?  I would like that.

Jeff

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jeff Hammond
If everyone would just indent with tabs, we could just set the indent
spacing with our editors ;-)

On Wed, Jun 3, 2015 at 10:01 PM, Barry Smith bsm...@mcs.anl.gov wrote:

 On Jun 3, 2015, at 9:58 PM, Jeff Hammond jeff.scie...@gmail.com wrote:

 http://git.mpich.org/mpich.git/blob/HEAD:/src/mpi/init/init.c
 https://github.com/open-mpi/ompi/blob/master/ompi/mpi/c/init.c

   As I said, super insane :-)

   Barry

   I'm just having fun here; I do believe that 2 is the ultimate correct 
 indentation but I can always run a preprocessor to fix their code before I 
 use it :-)


 Jeff

 On Wed, Jun 3, 2015 at 9:43 PM, Barry Smith bsm...@mcs.anl.gov wrote:

  Jeff,

   Ahh, from this page, it is definitively clear that the Intel people have 
 their heads totally up their asses

 formatted source code with astyle --style=linux --indent=spaces=4 -y -S

 when everyone knows that any indent that is not 2 characters is totally 
 insane :-)

  Barry


 On Jun 3, 2015, at 9:37 PM, Jeff Hammond jeff.scie...@gmail.com wrote:

 but it screws up memkind's partitioning of the heap (it won't be aware
 that the pages have been moved).

 Then memkind is stupid or the kernel isn't exposing the correct
 information to memkind.  Tell them to not be lazy and do it right.

 The beauty of git/github is one can make branches to try out anything
 they want even if Jed thinks that he knows better than Intel how to
 write system software for Intel's hardware.

 This link is equivalent to pushing the Fork button on Github's
 memkind page: https://github.com/memkind/memkind#fork-destination-box.
 I'm sure that the memkind developers would be willing to review your
 pull request once you've implemented memkind_move_pages().

 Jeff

 --
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/




 --
 Jeff Hammond
 jeff.scie...@gmail.com
 http://jeffhammond.github.io/




-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-06-03 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:

   It is OUR job as PETSc developers to hide that complexity from the
   most people who would be driven away from HPC because of it. 

Absolutely.  So now the question becomes what benefit can this have,
predicated on not letting the complexity bleed onto the user.

   Thus if Richard proposed changing VecCreate() to VecCreate(MPI_Comm,
   Crazy Intel specific Memkind options, Vec *x); we would reject
   it. He is not even coming close to proposing that, in fact he is not
   proposing anything, he is just asking for advise on how to run some
   experiments to see if the Phi crazy memory shit can be beneficial to
   some PETSc apps.

And my advice is to start with the simplest thing possible.

I'm also expressing skepticism that a more sophisticated solution that
_does not bleed complexity on the user_ is capable of substantially
beating the simple thing across a meaningful range of applications.

Says the man who suggest the PetscThreadComm stuff in PETSc that
was recently removed because it was too complicated and had too
(no) benefits :-)

Yes, I was trying to solve a problem that didn't need to be solved.  My
mistake.

The story to Congress is: China might beat us if you don't give us
money, any other effect is third order at best.

A smart Congress would say redefine 'beat us' to something that matters
and stop wasting your time on vanity.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-30 Thread Richard Mills
On Tue, Apr 28, 2015 at 10:47 PM, Jed Brown j...@jedbrown.org wrote:

 Richard Mills r...@utk.edu writes:

 [...]
  I think many users are going to want more control than what something
 like
  AutoHBW provides, but, as you say, a lot of the time one will only care
  about the the substantial allocations for things like matrices and
 vectors,
  and these also tend to be long lived--plenty of codes will do something
  like allocate a matrix for Jacobians once and keep it around for the
  lifetime of the run.  Maybe we should consider not using a heap manager
 for
  these allocations, then.  For allocations above some specified threshold,
  perhaps we (PETSc) should simply do the appropriate mmap() and mbind()
  calls to allocate the pages we need in the desired type of memory, and
 then
  we could use things like use move_pages() if/when appropriate (yes, I
 know
  we don't yet have a good way to make such decisions).  This would mean
  PETSc getting more into the lower level details of memory management, but
  maybe this is appropriate (an unavoidable) as more kinds of
  user-addressable memory get introduced.  I think is actually less
 horrible
  than it sounds, because, really, we would just want to do this for the
  largest allocations.  (And this is somewhat analogous to how many
 malloc()
  implementations work, anyway: Use sbrk() for the small stuff, and mmap()
  for the big stuff.)

 I say just use malloc (or posix_memalign) for everything.  PETSc can't
 do a better job of the fancy stuff and these normal functions are
 perfectly sufficient.

  That is a regression relative to move_pages.  Just make move_pages work.
  That's the granularity I've been asking for all along.
 
  Cannot practically be done using a heap manager system like memkind.  But
  we can do this if we do our own mmap() calls, as discussed above.

 In practice, we would still use malloc(), but set mallopt
 M_MMAP_THRESHOLD if needed and call move_pages.  The reality is that
 with 4 KiB pages, it doesn't even matter if your large allocation is
 not page aligned.  The first and last page don't matter--they're small
 enough to be inexpensive to re-fetch from DRAM and don't use up that
 much extra space if you map them into MCDRAM.


Hmm.  That may be a pretty good solution for DRAM vs. MCDRAM.  What about
when we further complicate things by adding some large pool of NVRAM?  One
might want some sufficiently large arrays to go into MCDRAM, but other
large arrays to go to NVRAM or DRAM.  I guess we can still do the
appropriate move_pages() to get things into the right places, but I can
also see wanting to do things like use a much large page size for giant
data sets going into NVRAM (which you won't be able to do without a copy to
a different mapped region).  And if there are these and other
complications... then maybe we should be using a heap manager like
memkind.  It would simplify quite a few things EXCEPT we'd have to deal
with the virtual address changing when we want to change kind of memory.
But maybe this would not be so bad, using an approach like Karli outlined.

--Richard


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-30 Thread Richard Mills
On Wed, Apr 29, 2015 at 2:39 AM, Karl Rupp r...@iue.tuwien.ac.at wrote:

 Hi,

  (...)


 If we want to move data from one memory kind to another, I believe that
 we need to be able to deal with the virtual address changing.  Yes, this
 is a pain because extra bookkeeping is involved.  Maybe we don't want to
 bother with supporting something like this in PETSc.  But I don't know
 of any good way around this.  I have discussed with Chris the idea of
 adding support for asynchronously copying pages between different kinds
 of memory (maybe have a memdup() analog to strdup()) and he had some
 ideas about how this might be done efficiently.  But, again, I don't
 know of a good way to move data to a different memory kind while keeping
 the same virtual address.  If I'm misunderstanding something about what
 is possible with Linux (or other *nix), please let me know--I'd really
 like to be wrong on this.


 let me bring up another spin of this thought: Currently we have related
 issues with managing memory on GPUs. The way we address this topic there is
 that we have a plain host-buffer, and a buffer allocated on the GPU. A
 separate flag keeps track of which buffer holds the most recent data (host,
 GPU, or both). What if we extend this system slightly such that we can also
 deal with HBM?

 Benefits:
  - Changes to code base mainly in *GetArrayReadWrite(), returning the
 'correct' buffer.
  - Command line options as well as APIs for enabling/disabling HBM can be
 easily provided.
  - DRAM fallback always available, even if HBM exhausted.
  - Similar code and logic for dealing with HBM and GPUs.

 Disadvantages:
  - Depending on the actual implementation, we may need extra memory (data
 duplication in HBM and DRAM). Since DRAM  HBM, this may not be a big
 issue.
  - Some parts of PETSc allocate memory directly rather than using standard
 types. These will not use HBM then. May not be performance-critical,
 though...
  - Asynchronous copies between DRAM and HBM remain tricky.
  - 'Politics': The approach is not as fancy as writing heap managers and
 other low-level stuff, so it's harder to sell to e.g. program managers.


I like several things about this proposal, and I think it might especially
make sense for systems with very large amounts of NVRAM, and the example
that Barry was talking about in which one might want to somehow mark an
allocation as being a target for eviction if needed.  At the expense of
data duplication, it also helps address problems that might arise if one
tries to access a data structure that is in the middle of being copied from
one memory kind to another.

--Richard



 Best regards,
 Karli




Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-30 Thread Richard Mills
On Wed, Apr 29, 2015 at 1:28 PM, Barry Smith bsm...@mcs.anl.gov wrote:


   Forget about the issue of changing PetscMallocN() or adding a new
 interface instead, that is a minor syntax and annoyance issue:

   The question is is it worth exploring adding a context for certain
 memory allocations that would allow us to do various things to the memory
 and indicate properties of the memory? I think, though I agree with Jed
 that it could be fraught with difficulties, that is is worthwhile playing
 around with this.

   Barry


I vote yes.  One might want to, say

* Give hints via something like madvise() on how/when the memory might be
accessed.
* Specify a preferred kind of memory (and behavior if the preferred kind
is not available, or perhaps even specify a priority on how hard to try to
get the preferred memory kind)
* Specify something like a preference to interleave allocation blocks
between different kinds of memory

I'm sure we can come up with plenty of other possibilities, some of which
might actually be useful, many of which will be useful only for very
contrived cases, and some that are not useful today but may become useful
as memory systems evolve.

--Richard


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-30 Thread Richard Mills
On Wed, Apr 29, 2015 at 11:10 PM, Richard Mills r...@utk.edu wrote:

 On Wed, Apr 29, 2015 at 1:28 PM, Barry Smith bsm...@mcs.anl.gov wrote:


   Forget about the issue of changing PetscMallocN() or adding a new
 interface instead, that is a minor syntax and annoyance issue:

   The question is is it worth exploring adding a context for certain
 memory allocations that would allow us to do various things to the memory
 and indicate properties of the memory? I think, though I agree with Jed
 that it could be fraught with difficulties, that is is worthwhile playing
 around with this.

   Barry


 I vote yes.  One might want to, say

 * Give hints via something like madvise() on how/when the memory might be
 accessed.
 * Specify a preferred kind of memory (and behavior if the preferred kind
 is not available, or perhaps even specify a priority on how hard to try to
 get the preferred memory kind)
 * Specify something like a preference to interleave allocation blocks
 between different kinds of memory


Let me add to the list of things we might want to do:

  * Specify that huge pages be used.

--Richard


 I'm sure we can come up with plenty of other possibilities, some of which
 might actually be useful, many of which will be useful only for very
 contrived cases, and some that are not useful today but may become useful
 as memory systems evolve.

 --Richard



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-29 Thread Karl Rupp

Hi,

 (...)


If we want to move data from one memory kind to another, I believe that
we need to be able to deal with the virtual address changing.  Yes, this
is a pain because extra bookkeeping is involved.  Maybe we don't want to
bother with supporting something like this in PETSc.  But I don't know
of any good way around this.  I have discussed with Chris the idea of
adding support for asynchronously copying pages between different kinds
of memory (maybe have a memdup() analog to strdup()) and he had some
ideas about how this might be done efficiently.  But, again, I don't
know of a good way to move data to a different memory kind while keeping
the same virtual address.  If I'm misunderstanding something about what
is possible with Linux (or other *nix), please let me know--I'd really
like to be wrong on this.


let me bring up another spin of this thought: Currently we have related 
issues with managing memory on GPUs. The way we address this topic there 
is that we have a plain host-buffer, and a buffer allocated on the GPU. 
A separate flag keeps track of which buffer holds the most recent data 
(host, GPU, or both). What if we extend this system slightly such that 
we can also deal with HBM?


Benefits:
 - Changes to code base mainly in *GetArrayReadWrite(), returning the 
'correct' buffer.
 - Command line options as well as APIs for enabling/disabling HBM can 
be easily provided.

 - DRAM fallback always available, even if HBM exhausted.
 - Similar code and logic for dealing with HBM and GPUs.

Disadvantages:
 - Depending on the actual implementation, we may need extra memory 
(data duplication in HBM and DRAM). Since DRAM  HBM, this may not be a 
big issue.
 - Some parts of PETSc allocate memory directly rather than using 
standard types. These will not use HBM then. May not be 
performance-critical, though...

 - Asynchronous copies between DRAM and HBM remain tricky.
 - 'Politics': The approach is not as fancy as writing heap managers 
and other low-level stuff, so it's harder to sell to e.g. program managers.


Best regards,
Karli



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-29 Thread Barry Smith

 On Apr 28, 2015, at 10:44 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
  The special malloc would need to save the locations at which it set
  the addresses and then switch the address to NULL. Then the code
  that used those locations would have to know that they that they may
  be set to NULL and hence check them before use.
 
 And then PetscMalloc(...,tmp); foo-data = tmp; creates SEGV at some
 unpredictable time.  Awesome!

   Obviously it is a controlled malloc that has to be used properly. If you 
know that you are getting some unreliable location you cannot do this type of 
code, nor would you. And since we are using the malloc in our code and users 
rarely need to use.

 
  I am not saying this particular thing would be practical or not,
  just that if we had a concept of a malloc context for each malloc
  there are many games we could try that we couldn't try otherwise and
  this is just one of them.
 
 I'm not convinced, except in the case of mixing in madvise hints.



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-29 Thread Barry Smith

 On Apr 29, 2015, at 12:29 AM, Richard Mills r...@utk.edu wrote:
 
 I think this is maybe getting away from the heart of the discussion, but 
 anyway: Barry, you are talking about being able to mark certain allocated 
 regions as optional and then have those allocations disappear, right?  I 
 could see, say, having such an optional allocation that is backed by a file 
 on some sort of fast storage (some super-fancy SSD or NVRAM) and whose 
 pages can be evicted if the memory is needed.  I don't know that I like a 
 pointer to that just being nullified if the eviction occurs, though.  For a 
 case like this, I'd like to do something like have a Vec and only have this 
 happen to the array of values if no get is outstanding.  This puts us back 
 with an array to the numerical values that could get swapped around.

   There are many possibilities from just release this memory to if you 
really need to you can move this to slower memory. 

   For example if after a DMRestoreLocalVector() the array memory could be 
marked as release if you need the space then on the next DMGetLocalVector() 
it would check if the memory had been released and allocated it again. If it 
had not been released then it would just use it.

   Clearly you can't just mark memory that is being passed around randomly in 
user code as release if need the space but you can for carefully controlled 
memory.

  Barry

 
 On Tue, Apr 28, 2015 at 8:44 PM, Jed Brown j...@jedbrown.org wrote:
 Barry Smith bsm...@mcs.anl.gov writes:
The special malloc would need to save the locations at which it set
the addresses and then switch the address to NULL. Then the code
that used those locations would have to know that they that they may
be set to NULL and hence check them before use.
 
 And then PetscMalloc(...,tmp); foo-data = tmp; creates SEGV at some
 unpredictable time.  Awesome!
 
I am not saying this particular thing would be practical or not,
just that if we had a concept of a malloc context for each malloc
there are many games we could try that we couldn't try otherwise and
this is just one of them.
 
 I'm not convinced, except in the case of mixing in madvise hints.
 



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-29 Thread Barry Smith

  Forget about the issue of changing PetscMallocN() or adding a new interface 
instead, that is a minor syntax and annoyance issue:  

  The question is is it worth exploring adding a context for certain memory 
allocations that would allow us to do various things to the memory and 
indicate properties of the memory? I think, though I agree with Jed that it 
could be fraught with difficulties, that is is worthwhile playing around with 
this.

  Barry


 On Apr 29, 2015, at 3:17 PM, Matthew Knepley knep...@gmail.com wrote:
 
 On Thu, Apr 30, 2015 at 6:13 AM, Barry Smith bsm...@mcs.anl.gov wrote:
 
  On Apr 29, 2015, at 12:29 AM, Richard Mills r...@utk.edu wrote:
 
  I think this is maybe getting away from the heart of the discussion, but 
  anyway: Barry, you are talking about being able to mark certain allocated 
  regions as optional and then have those allocations disappear, right?  I 
  could see, say, having such an optional allocation that is backed by a 
  file on some sort of fast storage (some super-fancy SSD or NVRAM) and 
  whose pages can be evicted if the memory is needed.  I don't know that I 
  like a pointer to that just being nullified if the eviction occurs, though. 
   For a case like this, I'd like to do something like have a Vec and only 
  have this happen to the array of values if no get is outstanding.  This 
  puts us back with an array to the numerical values that could get swapped 
  around.
 
There are many possibilities from just release this memory to if you 
 really need to you can move this to slower memory.
 
For example if after a DMRestoreLocalVector() the array memory could be 
 marked as release if you need the space then on the next DMGetLocalVector() 
 it would check if the memory had been released and allocated it again. If it 
 had not been released then it would just use it.
 
Clearly you can't just mark memory that is being passed around randomly in 
 user code as release if need the space but you can for carefully controlled 
 memory.
 
 I still see no convincing rationale for changing the PetscMalloc interface. 
 If, as you say, few people use it, then there
 is no reason to change it. We can just change our internal interface, and 
 leave that top level interface alone. Moreover,
 since none of the interface changes are very specific I think it needs time 
 to be shaken out. If at the end, things get
 faster and more understandable, we can replace PetscMalloc.
 
Matt
  
 
   Barry
 
 
  On Tue, Apr 28, 2015 at 8:44 PM, Jed Brown j...@jedbrown.org wrote:
  Barry Smith bsm...@mcs.anl.gov writes:
 The special malloc would need to save the locations at which it set
 the addresses and then switch the address to NULL. Then the code
 that used those locations would have to know that they that they may
 be set to NULL and hence check them before use.
 
  And then PetscMalloc(...,tmp); foo-data = tmp; creates SEGV at some
  unpredictable time.  Awesome!
 
 I am not saying this particular thing would be practical or not,
 just that if we had a concept of a malloc context for each malloc
 there are many games we could try that we couldn't try otherwise and
 this is just one of them.
 
  I'm not convinced, except in the case of mixing in madvise hints.
 
 
 
 
 
 -- 
 What most experimenters take for granted before they begin their experiments 
 is infinitely more interesting than any results to which their experiments 
 lead.
 -- Norbert Wiener



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Barry Smith

  PetscObject x; It would be problematic if the address x ever changed because 
copies of that address could be stored all over the place (as references to 
that object) but for the data within an object, such as the array of numerical 
values in a vector or matrix, or indices in IS, etc there is generally only a 
single copy of that address so (except when a Get is outstanding) so at least 
in theory that memory can be swapped around without effecting the user (ahh the 
power of abstraction :-).  You could write some very simple test code such as a 
VecChangeMemory() that allocates new array space and copies the values over, or 
you can even do as we do with GPUs and have multiple array spaces allocated (in 
different kinds) and have VecGetArray() depending on something return 
pointers to different ones.

  Barry



 On Apr 28, 2015, at 1:38 AM, Richard Mills r...@utk.edu wrote:
 
 On Mon, Apr 27, 2015 at 12:38 PM, Jed Brown j...@jedbrown.org wrote:
 Richard Mills r...@utk.edu writes:
  I think it is possible to add the memkind support without breaking all of
  the interfaces used throughout PETSc for PetscMalloc(), etc.  I recently
  sat with Chris Cantalupo, the main memkind developer, and walked him
  through PETSc's allocation routines, and we came up with the following: The
  imalloc() function pointer could have an implementation something like
 
  PetcErrorCode PetscMemkindMalloc(size_t size, const char *func, const char
  *file, void **result)
 
  {
 
  struct memkind *kind;
 
  int err;
 
 
 
  if (*result == NULL) {
 
  kind = MEMKIND_DEFAULT;
 
  }
 
  else {
 
  kind = (struct memkind *)(*result);
 
 I'm at a loss for words to express how disgusting this is.
 
 Ha ha!  Yeah, I don't like it either.  Chris and I were just thinking about 
 what we could do if we wanted to not break the existing API.  But one of my 
 favorite things about PETSc is that developers are never afraid to make 
 wholesale changes to things.
  
 
  This gives us (1) a method of passing the kind of memory without modifying
  the petsc allocation routine calling sequence,
 
 Nonsense, it just dodges the compiler's ability to tell you about the
 memory errors that it creates at every place where PetscMalloc is
 called!
 
 
 What did Chris say when you asked him about making memkind suck less?
 (Using shorthand to avoid retyping my previous long emails with
 constructive suggestions.)
  
 I had some pretty good discussions with Chris.  He's a very reasonable guy, 
 actually (and unfortunately has just moved to another project, so someone 
 else is going to have to take over memkind ownership).  I summarize the main 
 points (the ones I can recall, anyway) below:
 
 1) Easy one first: Regarding my wish for a call to accurately query the 
 amount of available high-bandwidth memory (MCDRAM), there is currently a 
 memkind_get_size() API but it has the shortcomings of being expensive and not 
 taking into account the heap's free pool (just the memory that the OS knows 
 to be available).  It should be possible to get around the expense of the 
 call with some caching and to include the free pool accounting.  Don't know 
 if any work has been done on this one, yet.
 
 2) Regarding the desire to be able to move pages between kinds of memory 
 while keeping the same virtual address:  This is tough to implement in a way 
 that will give decent performance.  I guess that what we'd really like to 
 have would be an API like
 
   int memkind_convert(memkind_t kind, void *ptr, size_t size);
 
 but the problem with the above is that is if the physical backing of a 
 virtual address is being changed, then a POSIX system call has to be made.  
 This also means that a heap management system tracking properties of virtual 
 address ranges for reuse after freeing will require *making a system call to 
 query the properties at the time of the free*.  This kills a lot of the 
 reason for using a heap manager in the first place: avoiding the expense of 
 repeated system calls (otherwise we'd just use mmap() for everything) by 
 reusing memory already obtained from the kernel.
 
 Linux provides the mbind(2) and move_pages(2) system calls that enable the 
 user to modify the backing physical pages of virtual address ranges within 
 the NUMA architecture, so these can be used to move physical pages between 
 NUMA nodes (and high bandwidth on-package memory will be treated as a NUMA 
 node).  (A user on a KNL system could actually use move_pages(2) to move 
 between DRAM and MCDRAM, I believe.)  But Linux doesn't provide an equivalent 
 way for a user to change the page size of the backing physical pages of an 
 address range, so it's not possible to implement the above memkind_convert() 
 with what Linux currently provides.
 
 If we want to move data from one memory kind to another, I believe that we 
 need to be able to deal with the virtual address changing.  Yes, this is a 
 pain because extra bookkeeping 

Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Jed Brown
Richard Mills r...@utk.edu writes:
 I really like Barry's proposal to add this context.  I can think of other
 things that could go into that context, too, like hints about how the
 memory will be used (passed to the OS via madvise(2), for instance).  

I like this better.  And memkind should really be an enhancement to
posix_madvise.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
   For things like we talked about the other day. Malloc zones,
    maybe some indication that it is ok that the runtime take back
   this memory if available memory is running low, 

How do you communicate to the accessor that the memory has been freed?

   the ability to turn off read or all access to a malloc zone so that
   another library cannot corrupt the data, etc. When I said
   independent of memkind I meant having nothing to do with memkind.

Sure, and I'm not opposed to the concept, but I'd like it to somehow be
based on information that the caller can use and have semantics that are
implementable.  I'm also not wild about the global variables like
PETSC_COMM_WORLD (whose use is pretty much always wrong in library
code), so would like to know how a relevant context would be plumbed
into the caller's scope.

   You are correct that involves lots of nonlocal information or information 
 that is not yet known, so the argument cannot be simple flags but must be a 
 context that can be modified at later times.  Crude example

   malloc(zone1, n,x);

   ZoneSetReadOnly(zone1);

This is implementable, just somewhat expensive.

   ZoneSetAvailableIfNeeded(zone1);

I don't know what semantics this could have that wouldn't be a
programming disaster.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
 How do you communicate to the accessor that the memory has been freed?
 
   Accessor? What is accessor?

The code that accesses the memory behind the pointer (via the pointer or
otherwise).


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Barry Smith

 On Apr 28, 2015, at 5:04 PM, Jed Brown j...@jedbrown.org wrote:
 
 Barry Smith bsm...@mcs.anl.gov writes:
 How do you communicate to the accessor that the memory has been freed?
 
  Accessor? What is accessor?
 
 The code that accesses the memory behind the pointer (via the pointer or
 otherwise).

  The special malloc would need to save the locations at which it set the 
addresses and then switch the address to NULL. Then the code that used those 
locations would have to know that they that they may be set to NULL and hence 
check them before use.

  I am not saying this particular thing would be practical or not, just that if 
we had a concept of a malloc context for each malloc there are many games we 
could try that we couldn't try otherwise and this is just one of them.

  Barry




Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Jed Brown
Barry Smith bsm...@mcs.anl.gov writes:
   The special malloc would need to save the locations at which it set
   the addresses and then switch the address to NULL. Then the code
   that used those locations would have to know that they that they may
   be set to NULL and hence check them before use.

And then PetscMalloc(...,tmp); foo-data = tmp; creates SEGV at some
unpredictable time.  Awesome!

   I am not saying this particular thing would be practical or not,
   just that if we had a concept of a malloc context for each malloc
   there are many games we could try that we couldn't try otherwise and
   this is just one of them.

I'm not convinced, except in the case of mixing in madvise hints.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Richard Mills
I think this is maybe getting away from the heart of the discussion, but
anyway: Barry, you are talking about being able to mark certain allocated
regions as optional and then have those allocations disappear, right?  I
could see, say, having such an optional allocation that is backed by a
file on some sort of fast storage (some super-fancy SSD or NVRAM) and
whose pages can be evicted if the memory is needed.  I don't know that I
like a pointer to that just being nullified if the eviction occurs,
though.  For a case like this, I'd like to do something like have a Vec and
only have this happen to the array of values if no get is outstanding.
This puts us back with an array to the numerical values that could get
swapped around.

On Tue, Apr 28, 2015 at 8:44 PM, Jed Brown j...@jedbrown.org wrote:

 Barry Smith bsm...@mcs.anl.gov writes:
The special malloc would need to save the locations at which it set
the addresses and then switch the address to NULL. Then the code
that used those locations would have to know that they that they may
be set to NULL and hence check them before use.

 And then PetscMalloc(...,tmp); foo-data = tmp; creates SEGV at some
 unpredictable time.  Awesome!

I am not saying this particular thing would be practical or not,
just that if we had a concept of a malloc context for each malloc
there are many games we could try that we couldn't try otherwise and
this is just one of them.

 I'm not convinced, except in the case of mixing in madvise hints.



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Jed Brown
Richard Mills r...@utk.edu writes:

 Really?  That's what I'm asking for.

 Yes, I am ~ 99% sure that this is the case, but I will double-check to make
 sure.

Thanks.

 For small allocations, it doesn't matter where the memory is located
 because it's either in cache or it's not.  From what I hear, KNL's
 MCDRAM won't improve latency, so all such allocations may as well go in
 DRAM anyway.  So all I care about are substantial allocations, like
 matrix and vector data.  It's not expensive to allocate those the align
 with page boundaries (provided they are big enough; coarse grids don't
 matter).

 Yes, MCDRAM won't help with latency, only bandwidth, so for small
 allocations it won't matter.  Following reasoning like what you have above,
 a colleague on my team recently developed an AutoHBW tool for users who
 don't want to modify their code at all.  A user can specify a size
 threshold above which allocations should come from MCDRAM, and then the
 tool interposes on the malloc() (or other allocator) calls to put the small
 stuff in DRAM and the big stuff in MCDRAM.

What's the point?  If you can fit all the large allocations in MCDRAM,
can't you just fit everything in MCDRAM?  Is that so bad?

 I think many users are going to want more control than what something like
 AutoHBW provides, but, as you say, a lot of the time one will only care
 about the the substantial allocations for things like matrices and vectors,
 and these also tend to be long lived--plenty of codes will do something
 like allocate a matrix for Jacobians once and keep it around for the
 lifetime of the run.  Maybe we should consider not using a heap manager for
 these allocations, then.  For allocations above some specified threshold,
 perhaps we (PETSc) should simply do the appropriate mmap() and mbind()
 calls to allocate the pages we need in the desired type of memory, and then
 we could use things like use move_pages() if/when appropriate (yes, I know
 we don't yet have a good way to make such decisions).  This would mean
 PETSc getting more into the lower level details of memory management, but
 maybe this is appropriate (an unavoidable) as more kinds of
 user-addressable memory get introduced.  I think is actually less horrible
 than it sounds, because, really, we would just want to do this for the
 largest allocations.  (And this is somewhat analogous to how many malloc()
 implementations work, anyway: Use sbrk() for the small stuff, and mmap()
 for the big stuff.)

I say just use malloc (or posix_memalign) for everything.  PETSc can't
do a better job of the fancy stuff and these normal functions are
perfectly sufficient.

 That is a regression relative to move_pages.  Just make move_pages work.
 That's the granularity I've been asking for all along.

 Cannot practically be done using a heap manager system like memkind.  But
 we can do this if we do our own mmap() calls, as discussed above.

In practice, we would still use malloc(), but set mallopt
M_MMAP_THRESHOLD if needed and call move_pages.  The reality is that
with 4 KiB pages, it doesn't even matter if your large allocation is
not page aligned.  The first and last page don't matter--they're small
enough to be inexpensive to re-fetch from DRAM and don't use up that
much extra space if you map them into MCDRAM.


signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Richard Mills
On Tue, Apr 28, 2015 at 9:35 AM, Jed Brown j...@jedbrown.org wrote:

 Richard Mills r...@utk.edu writes:

[...]

 Linux provides the mbind(2) and move_pages(2) system calls that enable the
  user to modify the backing physical pages of virtual address ranges
 within
  the NUMA architecture, so these can be used to move physical pages
 between
  NUMA nodes (and high bandwidth on-package memory will be treated as a
 NUMA
  node).  (A user on a KNL system could actually use move_pages(2) to move
  between DRAM and MCDRAM, I believe.)

 Really?  That's what I'm asking for.


Yes, I am ~ 99% sure that this is the case, but I will double-check to make
sure.



  But Linux doesn't provide an equivalent way for a user to change the
  page size of the backing physical pages of an address range, so it's
  not possible to implement the above memkind_convert() with what Linux
  currently provides.

 For small allocations, it doesn't matter where the memory is located
 because it's either in cache or it's not.  From what I hear, KNL's
 MCDRAM won't improve latency, so all such allocations may as well go in
 DRAM anyway.  So all I care about are substantial allocations, like
 matrix and vector data.  It's not expensive to allocate those the align
 with page boundaries (provided they are big enough; coarse grids don't
 matter).


Yes, MCDRAM won't help with latency, only bandwidth, so for small
allocations it won't matter.  Following reasoning like what you have above,
a colleague on my team recently developed an AutoHBW tool for users who
don't want to modify their code at all.  A user can specify a size
threshold above which allocations should come from MCDRAM, and then the
tool interposes on the malloc() (or other allocator) calls to put the small
stuff in DRAM and the big stuff in MCDRAM.

I think many users are going to want more control than what something like
AutoHBW provides, but, as you say, a lot of the time one will only care
about the the substantial allocations for things like matrices and vectors,
and these also tend to be long lived--plenty of codes will do something
like allocate a matrix for Jacobians once and keep it around for the
lifetime of the run.  Maybe we should consider not using a heap manager for
these allocations, then.  For allocations above some specified threshold,
perhaps we (PETSc) should simply do the appropriate mmap() and mbind()
calls to allocate the pages we need in the desired type of memory, and then
we could use things like use move_pages() if/when appropriate (yes, I know
we don't yet have a good way to make such decisions).  This would mean
PETSc getting more into the lower level details of memory management, but
maybe this is appropriate (an unavoidable) as more kinds of
user-addressable memory get introduced.  I think is actually less horrible
than it sounds, because, really, we would just want to do this for the
largest allocations.  (And this is somewhat analogous to how many malloc()
implementations work, anyway: Use sbrk() for the small stuff, and mmap()
for the big stuff.)



  If we want to move data from one memory kind to another, I believe that
 we
  need to be able to deal with the virtual address changing.

 That is a regression relative to move_pages.  Just make move_pages work.
 That's the granularity I've been asking for all along.


Cannot practically be done using a heap manager system like memkind.  But
we can do this if we do our own mmap() calls, as discussed above.



  Yes, this is a pain because extra bookkeeping is involved.  Maybe we
  don't want to bother with supporting something like this in PETSc.
  But I don't know of any good way around this.  I have discussed with
  Chris the idea of adding support for asynchronously copying pages
  between different kinds of memory (maybe have a memdup() analog to
  strdup()) and he had some ideas about how this might be done
  efficiently.  But, again, I don't know of a good way to move data to a
  different memory kind while keeping the same virtual address.  If I'm
  misunderstanding something about what is possible with Linux (or other
  *nix), please let me know--I'd really like to be wrong on this.

 Moving memory at page granularity is all you can do.  The hardware
 doesn't support virtual-physical mapping at different granularity, so
 there is no way to preserve address without affecting everything sharing
 that page.  But memkinds only matter for large allocations.

 Is it a showstopper to have different addresses and do full copies?
 It's more of a mess with threads (requires extra
 synchronization/coordination), but it's sometimes (maybe often)
 feasible.  It's certainly ugly and a debugging nightmare (e.g., you'll
 set a location watchpoint and not see where it was modified because it
 was copied out to a different kind).  We'll also need a system for
 eviction.



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Jed Brown
Richard Mills r...@utk.edu writes:

 I'm at a loss for words to express how disgusting this is.


 Ha ha!  Yeah, I don't like it either.  Chris and I were just thinking about
 what we could do if we wanted to not break the existing API.  

But it DOES BREAK THE EXISTING API!  If you make this change, ALL
EXISTING CODE IS BROKEN and yet broken in a way that the compiler cannot
warn about.  This is literally the worst possible thing.

 What did Chris say when you asked him about making memkind suck less?
 (Using shorthand to avoid retyping my previous long emails with
 constructive suggestions.)


 I had some pretty good discussions with Chris.  He's a very reasonable guy,
 actually (and unfortunately has just moved to another project, so someone
 else is going to have to take over memkind ownership).  I summarize the
 main points (the ones I can recall, anyway) below:

 1) Easy one first: Regarding my wish for a call to accurately query the
 amount of available high-bandwidth memory (MCDRAM), there is currently a
 memkind_get_size() API but it has the shortcomings of being expensive and
 not taking into account the heap's free pool (just the memory that the OS
 knows to be available).  It should be possible to get around the expense of
 the call with some caching and to include the free pool accounting.  Don't
 know if any work has been done on this one, yet.

I don't think this is very useful for multi-process or threaded code
(i.e., all code that might run on KNL) due to race conditions.  Suppose
that 1% of processes get allocation kinds mixed up due to the race
condition and then run 5x slower for the memory-bound phases of the
application.  Have fun load balancing that.  If you want reproducible
performance and/or avoid this load balancing disaster, you need to
either solve the packing problem in a deterministic way or you need to
adaptively modify the policy so that you can fix the low-quality
allocations due to race conditions.

 2) Regarding the desire to be able to move pages between kinds of memory
 while keeping the same virtual address:  This is tough to implement in a
 way that will give decent performance.  I guess that what we'd really like
 to have would be an API like

   int memkind_convert(memkind_t kind, void *ptr, size_t size);

 but the problem with the above is that is if the physical backing of a
 virtual address is being changed, then a POSIX system call has to be made.

This interface is too fine-grained in my opinion.

 Linux provides the mbind(2) and move_pages(2) system calls that enable the
 user to modify the backing physical pages of virtual address ranges within
 the NUMA architecture, so these can be used to move physical pages between
 NUMA nodes (and high bandwidth on-package memory will be treated as a NUMA
 node).  (A user on a KNL system could actually use move_pages(2) to move
 between DRAM and MCDRAM, I believe.)  

Really?  That's what I'm asking for.

 But Linux doesn't provide an equivalent way for a user to change the
 page size of the backing physical pages of an address range, so it's
 not possible to implement the above memkind_convert() with what Linux
 currently provides.

For small allocations, it doesn't matter where the memory is located
because it's either in cache or it's not.  From what I hear, KNL's
MCDRAM won't improve latency, so all such allocations may as well go in
DRAM anyway.  So all I care about are substantial allocations, like
matrix and vector data.  It's not expensive to allocate those the align
with page boundaries (provided they are big enough; coarse grids don't
matter).

 If we want to move data from one memory kind to another, I believe that we
 need to be able to deal with the virtual address changing.  

That is a regression relative to move_pages.  Just make move_pages work.
That's the granularity I've been asking for all along.

 Yes, this is a pain because extra bookkeeping is involved.  Maybe we
 don't want to bother with supporting something like this in PETSc.
 But I don't know of any good way around this.  I have discussed with
 Chris the idea of adding support for asynchronously copying pages
 between different kinds of memory (maybe have a memdup() analog to
 strdup()) and he had some ideas about how this might be done
 efficiently.  But, again, I don't know of a good way to move data to a
 different memory kind while keeping the same virtual address.  If I'm
 misunderstanding something about what is possible with Linux (or other
 *nix), please let me know--I'd really like to be wrong on this.

Moving memory at page granularity is all you can do.  The hardware
doesn't support virtual-physical mapping at different granularity, so
there is no way to preserve address without affecting everything sharing
that page.  But memkinds only matter for large allocations.

Is it a showstopper to have different addresses and do full copies?
It's more of a mess with threads (requires extra
synchronization/coordination), but it's sometimes 

Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Richard Mills
On Mon, Apr 27, 2015 at 12:38 PM, Jed Brown j...@jedbrown.org wrote:

 Richard Mills r...@utk.edu writes:
  I think it is possible to add the memkind support without breaking all of
  the interfaces used throughout PETSc for PetscMalloc(), etc.  I recently
  sat with Chris Cantalupo, the main memkind developer, and walked him
  through PETSc's allocation routines, and we came up with the following:
 The
  imalloc() function pointer could have an implementation something like
 
  PetcErrorCode PetscMemkindMalloc(size_t size, const char *func, const
 char
  *file, void **result)
 
  {
 
  struct memkind *kind;
 
  int err;
 
 
 
  if (*result == NULL) {
 
  kind = MEMKIND_DEFAULT;
 
  }
 
  else {
 
  kind = (struct memkind *)(*result);

 I'm at a loss for words to express how disgusting this is.


Ha ha!  Yeah, I don't like it either.  Chris and I were just thinking about
what we could do if we wanted to not break the existing API.  But one of my
favorite things about PETSc is that developers are never afraid to make
wholesale changes to things.



  This gives us (1) a method of passing the kind of memory without
 modifying
  the petsc allocation routine calling sequence,

 Nonsense, it just dodges the compiler's ability to tell you about the
 memory errors that it creates at every place where PetscMalloc is
 called!


 What did Chris say when you asked him about making memkind suck less?
 (Using shorthand to avoid retyping my previous long emails with
 constructive suggestions.)


I had some pretty good discussions with Chris.  He's a very reasonable guy,
actually (and unfortunately has just moved to another project, so someone
else is going to have to take over memkind ownership).  I summarize the
main points (the ones I can recall, anyway) below:

1) Easy one first: Regarding my wish for a call to accurately query the
amount of available high-bandwidth memory (MCDRAM), there is currently a
memkind_get_size() API but it has the shortcomings of being expensive and
not taking into account the heap's free pool (just the memory that the OS
knows to be available).  It should be possible to get around the expense of
the call with some caching and to include the free pool accounting.  Don't
know if any work has been done on this one, yet.

2) Regarding the desire to be able to move pages between kinds of memory
while keeping the same virtual address:  This is tough to implement in a
way that will give decent performance.  I guess that what we'd really like
to have would be an API like

  int memkind_convert(memkind_t kind, void *ptr, size_t size);

but the problem with the above is that is if the physical backing of a
virtual address is being changed, then a POSIX system call has to be made.
This also means that a heap management system tracking properties of
virtual address ranges for reuse after freeing will require *making a
system call to query the properties at the time of the free*.  This kills a
lot of the reason for using a heap manager in the first place: avoiding the
expense of repeated system calls (otherwise we'd just use mmap() for
everything) by reusing memory already obtained from the kernel.

Linux provides the mbind(2) and move_pages(2) system calls that enable the
user to modify the backing physical pages of virtual address ranges within
the NUMA architecture, so these can be used to move physical pages between
NUMA nodes (and high bandwidth on-package memory will be treated as a NUMA
node).  (A user on a KNL system could actually use move_pages(2) to move
between DRAM and MCDRAM, I believe.)  But Linux doesn't provide an
equivalent way for a user to change the page size of the backing physical
pages of an address range, so it's not possible to implement the above
memkind_convert() with what Linux currently provides.

If we want to move data from one memory kind to another, I believe that we
need to be able to deal with the virtual address changing.  Yes, this is a
pain because extra bookkeeping is involved.  Maybe we don't want to bother
with supporting something like this in PETSc.  But I don't know of any good
way around this.  I have discussed with Chris the idea of adding support
for asynchronously copying pages between different kinds of memory (maybe
have a memdup() analog to strdup()) and he had some ideas about how this
might be done efficiently.  But, again, I don't know of a good way to move
data to a different memory kind while keeping the same virtual address.  If
I'm misunderstanding something about what is possible with Linux (or other
*nix), please let me know--I'd really like to be wrong on this.

Say that a library is eventually made available that can process all of the
nonlocal information to make reasonable recommendations about where various
data structures should be placed (or, hell, say that there is just an
oracle we can consult about this), but there isn't a good way to do this
while keeping the same virtual address.  Would this be 

Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-28 Thread Richard Mills
On Mon, Apr 27, 2015 at 1:56 PM, Barry Smith bsm...@mcs.anl.gov wrote:


  On Apr 27, 2015, at 2:46 PM, Jed Brown j...@jedbrown.org wrote:
 
  Barry Smith bsm...@mcs.anl.gov writes:
 MPI_Comm argument?  PETSc users rarely need to call PetscMalloc()
 themselves and if they do call it then they should know the
 properties of the memory they are allocating. Most users won't
 even notice the change.
 
  I think that's an exaggeration, but what are you going to use for the
  kind parameter?  The correct value depends on a ton of non-local
  information.
 
Note that I'd like to add this argument independent of memkind.

   For things like we talked about the other day. Malloc zones,  maybe
 some indication that it is ok that the runtime take back this memory if
 available memory is running low, the ability to turn off read or all access
 to a malloc zone so that another library cannot corrupt the data, etc. When
 I said independent of memkind I meant having nothing to do with memkind.

   You are correct that involves lots of nonlocal information or
 information that is not yet known, so the argument cannot be simple flags
 but must be a context that can be modified at later times.  Crude example

   malloc(zone1, n,x);

   ZoneSetReadOnly(zone1);
   ..
   ZoneSetAvailableIfNeeded(zone1);


I really like Barry's proposal to add this context.  I can think of other
things that could go into that context, too, like hints about how the
memory will be used (passed to the OS via madvise(2), for instance).  Sure,
most of the time people won't want to pass such hints and users could
ignore this, but this is consistent with the PETSc philosophy of exposing
various details to the user if he/she cares, but making a reasonable choice
if the user doesn't.




   Barry


 
  What are you going to use it for?  If the allocation is small enough,
  it'll probably be resident in cache and if it falls out, the lower
  latency to DRAM will be better than HBM.  As it gets bigger, provided it
  gets enough use, then HBM becomes the right place, but later it's too
  big and you have to go back to DRAM.
 
  What happens if memory of the kind requested is unavailable?  Error or
  the implementations tries to find a different kind?  If there are
  several memory kinds, what order is used when checking?


The questions in the second paragraph may be something worth enabling the
user to set, either through some global preference or particular entries in
the context.


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-27 Thread Jed Brown
Richard Mills r...@utk.edu writes:
 I think it is possible to add the memkind support without breaking all of
 the interfaces used throughout PETSc for PetscMalloc(), etc.  I recently
 sat with Chris Cantalupo, the main memkind developer, and walked him
 through PETSc's allocation routines, and we came up with the following: The
 imalloc() function pointer could have an implementation something like

 PetcErrorCode PetscMemkindMalloc(size_t size, const char *func, const char
 *file, void **result)

 {

 struct memkind *kind;

 int err;



 if (*result == NULL) {

 kind = MEMKIND_DEFAULT;

 }

 else {

 kind = (struct memkind *)(*result);

I'm at a loss for words to express how disgusting this is.

 This gives us (1) a method of passing the kind of memory without modifying
 the petsc allocation routine calling sequence, 

Nonsense, it just dodges the compiler's ability to tell you about the
memory errors that it creates at every place where PetscMalloc is
called!


What did Chris say when you asked him about making memkind suck less?
(Using shorthand to avoid retyping my previous long emails with
constructive suggestions.)

 and (2) support a fall back code path legacy applications which will
 not set the pointer to NULL.  Or am I missing something?



signature.asc
Description: PGP signature


Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-27 Thread Barry Smith

 On Apr 27, 2015, at 2:30 PM, Matthew Knepley knep...@gmail.com wrote:
 
 On Tue, Apr 28, 2015 at 5:26 AM, Barry Smith bsm...@mcs.anl.gov wrote:
 
  On Apr 27, 2015, at 1:51 PM, Richard Mills r...@utk.edu wrote:
 
  All,
 
  I'd like to add support for the allocators provided by the 'memkind' 
  library (https://github.com/memkind/memkind).  I've discussed memkind a 
  little bit with some of you off-list.  Briefly, memkind provides a user 
  extensible heap manager built on top of jemalloc which enables control of 
  memory characteristics and a partitioning of the heap between kinds of 
  memory.  The immediate motivation is to support placement of critical data 
  structures into the high bandwidth on-package memory that will be available 
  with Intel's Knights Landing generation of Xeon Phi processor (like on 
  the upcoming NERSC Cori machine), but what the library provides is more 
  general, and it can also be used for placing data in memory such as 
  nonvolatile RAM (NVRAM), which will be appearing in more systems.
 
  I'm with Jed in thinking that, ideally, PETSc (or its users) shouldn't have 
  to make decisions about the optimal way to solve the packing problem of 
  what should go into high-bandwidth memory.  (In fact, I think this is a 
  really interesting research problem that relates to some work on 
  memory-adaptation in scientific applications that I did back when I was 
  doing my Ph.D. research, e.g., 
  http://www.climatemodeling.org/~rmills/pubs/JGC_mmlib_2007.pdf.)  However, 
  right now I'd like to take the baby step of coming up with a mechanism to 
  simply tag PETSc objects with a kind of memory that is preferred, and then 
  having associated allocations reflect that preference (or requirement, if 
  the user wants allocations to fail if such memory is not available).  Later 
  we can worry about how to move data structures in and out of a kind of 
  memory.
 
  It might make sense to add an option for certain PETSc classes--Mat and Vec 
  are the most obvious here--to prefer allocations in a certain kind of 
  memory.  Or, would it make more sense to add such an option at the 
  PetscObject level?
 
  I think it is possible to add the memkind support without breaking all of 
  the interfaces used throughout PETSc for PetscMalloc(), etc.
 
   I don't think having this as a goal is useful at all! Just break the 
 current interface; add an abstract memkind argument to all PetscMalloc() 
 and Free() calls that indicates any additional information about the memory 
 requested. By making it abstract it will always just be there and on 
 systems without any special memory options it just doesn't do anything.
 
 Since Malloc() is so pervasive, I think it would be useful to have a 2 level 
 interface here. The standard Malloc() would call
 you advanced PlacedMalloc(), and anyone could call that function, but I think 
 its just cruel to make everyone allocating
 memory give arguments they do not understand or need.

MPI_Comm argument?  PETSc users rarely need to call PetscMalloc() 
themselves and if they do call it then they should know the properties of the 
memory they are allocating. Most users won't even notice the change. 

   Note that I'd like to add this argument independent of memkind. 

   Barry

 
   Matt
  
Barry
 
Note: it is not clear to me how this could be helpful on its own because I 
 agree with Jed how is the user when creating the object supposed to know the 
 optimal place to put the object?  For more complex objects it may be that 
 different parts of the object would be stored in different types of memory 
 etc etc.
 
 
   I recently sat with Chris Cantalupo, the main memkind developer, and 
  walked him through PETSc's allocation routines, and we came up with the 
  following: The imalloc() function pointer could have an implementation 
  something like
 
  PetcErrorCode PetscMemkindMalloc(size_t size, const char *func, const char 
  *file, void **result)
 
  {
 
  struct memkind *kind;
 
  int err;
 
 
  if (*result == NULL) {
 
  kind = MEMKIND_DEFAULT;
 
  }
 
  else {
 
  kind = (struct memkind *)(*result);
 
  }
 
 
  err = memkind_posix_memalign(kind, result, 16, size);
 
  return PosixErrToPetscErr(err);
 
  }
 
 
  and ifree will look something like:
 
  PetcErrorCode PetscMemkindFree(void *ptr, int a, const char *func, const 
  char *file)
 
  {
 
  memkind_free(0, ptr);
 
  return 0;
 
  }
 
 
  This gives us (1) a method of passing the kind of memory without modifying 
  the petsc allocation routine calling sequence, and (2) support a fall back 
  code path legacy applications which will not set the pointer to NULL.  Or 
  am I missing something?
 
  Thoughts?  I'd like to hash out something soon and start writing some code.
 
  --Richard
 
 
 
 
 -- 
 What most experimenters take for granted before they begin their experiments 
 is infinitely more interesting than any results to 

Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-27 Thread Barry Smith

 On Apr 27, 2015, at 1:51 PM, Richard Mills r...@utk.edu wrote:
 
 All,
 
 I'd like to add support for the allocators provided by the 'memkind' library 
 (https://github.com/memkind/memkind).  I've discussed memkind a little bit 
 with some of you off-list.  Briefly, memkind provides a user extensible heap 
 manager built on top of jemalloc which enables control of memory 
 characteristics and a partitioning of the heap between kinds of memory.  The 
 immediate motivation is to support placement of critical data structures into 
 the high bandwidth on-package memory that will be available with Intel's 
 Knights Landing generation of Xeon Phi processor (like on the upcoming 
 NERSC Cori machine), but what the library provides is more general, and it 
 can also be used for placing data in memory such as nonvolatile RAM (NVRAM), 
 which will be appearing in more systems.
 
 I'm with Jed in thinking that, ideally, PETSc (or its users) shouldn't have 
 to make decisions about the optimal way to solve the packing problem of 
 what should go into high-bandwidth memory.  (In fact, I think this is a 
 really interesting research problem that relates to some work on 
 memory-adaptation in scientific applications that I did back when I was doing 
 my Ph.D. research, e.g., 
 http://www.climatemodeling.org/~rmills/pubs/JGC_mmlib_2007.pdf.)  However, 
 right now I'd like to take the baby step of coming up with a mechanism to 
 simply tag PETSc objects with a kind of memory that is preferred, and then 
 having associated allocations reflect that preference (or requirement, if the 
 user wants allocations to fail if such memory is not available).  Later we 
 can worry about how to move data structures in and out of a kind of memory.
 
 It might make sense to add an option for certain PETSc classes--Mat and Vec 
 are the most obvious here--to prefer allocations in a certain kind of memory. 
  Or, would it make more sense to add such an option at the PetscObject level?
 
 I think it is possible to add the memkind support without breaking all of the 
 interfaces used throughout PETSc for PetscMalloc(), etc.

  I don't think having this as a goal is useful at all! Just break the current 
interface; add an abstract memkind argument to all PetscMalloc() and Free() 
calls that indicates any additional information about the memory requested. By 
making it abstract it will always just be there and on systems without any 
special memory options it just doesn't do anything.

   Barry

   Note: it is not clear to me how this could be helpful on its own because I 
agree with Jed how is the user when creating the object supposed to know the 
optimal place to put the object?  For more complex objects it may be that 
different parts of the object would be stored in different types of memory etc 
etc.


  I recently sat with Chris Cantalupo, the main memkind developer, and walked 
 him through PETSc's allocation routines, and we came up with the following: 
 The imalloc() function pointer could have an implementation something like
 
 PetcErrorCode PetscMemkindMalloc(size_t size, const char *func, const char 
 *file, void **result)
 
 {
 
 struct memkind *kind;
 
 int err;
 
  
 if (*result == NULL) {
 
 kind = MEMKIND_DEFAULT;
 
 }
 
 else {
 
 kind = (struct memkind *)(*result);
 
 }
 
 
 err = memkind_posix_memalign(kind, result, 16, size);
 
 return PosixErrToPetscErr(err);
 
 }
 
  
 and ifree will look something like:
  
 PetcErrorCode PetscMemkindFree(void *ptr, int a, const char *func, const char 
 *file)
 
 {
 
 memkind_free(0, ptr);
 
 return 0;
 
 }
 
  
 This gives us (1) a method of passing the kind of memory without modifying 
 the petsc allocation routine calling sequence, and (2) support a fall back 
 code path legacy applications which will not set the pointer to NULL.  Or am 
 I missing something?
 
 Thoughts?  I'd like to hash out something soon and start writing some code.
 
 --Richard



Re: [petsc-dev] Adding support memkind allocators in PETSc

2015-04-27 Thread Matthew Knepley
On Tue, Apr 28, 2015 at 5:26 AM, Barry Smith bsm...@mcs.anl.gov wrote:


  On Apr 27, 2015, at 1:51 PM, Richard Mills r...@utk.edu wrote:
 
  All,
 
  I'd like to add support for the allocators provided by the 'memkind'
 library (https://github.com/memkind/memkind).  I've discussed memkind a
 little bit with some of you off-list.  Briefly, memkind provides a user
 extensible heap manager built on top of jemalloc which enables control of
 memory characteristics and a partitioning of the heap between kinds of
 memory.  The immediate motivation is to support placement of critical data
 structures into the high bandwidth on-package memory that will be available
 with Intel's Knights Landing generation of Xeon Phi processor (like on
 the upcoming NERSC Cori machine), but what the library provides is more
 general, and it can also be used for placing data in memory such as
 nonvolatile RAM (NVRAM), which will be appearing in more systems.
 
  I'm with Jed in thinking that, ideally, PETSc (or its users) shouldn't
 have to make decisions about the optimal way to solve the packing problem
 of what should go into high-bandwidth memory.  (In fact, I think this is a
 really interesting research problem that relates to some work on
 memory-adaptation in scientific applications that I did back when I was
 doing my Ph.D. research, e.g.,
 http://www.climatemodeling.org/~rmills/pubs/JGC_mmlib_2007.pdf.)
 However, right now I'd like to take the baby step of coming up with a
 mechanism to simply tag PETSc objects with a kind of memory that is
 preferred, and then having associated allocations reflect that preference
 (or requirement, if the user wants allocations to fail if such memory is
 not available).  Later we can worry about how to move data structures in
 and out of a kind of memory.
 
  It might make sense to add an option for certain PETSc classes--Mat and
 Vec are the most obvious here--to prefer allocations in a certain kind of
 memory.  Or, would it make more sense to add such an option at the
 PetscObject level?
 
  I think it is possible to add the memkind support without breaking all
 of the interfaces used throughout PETSc for PetscMalloc(), etc.

   I don't think having this as a goal is useful at all! Just break the
 current interface; add an abstract memkind argument to all PetscMalloc()
 and Free() calls that indicates any additional information about the memory
 requested. By making it abstract it will always just be there and on
 systems without any special memory options it just doesn't do anything.


Since Malloc() is so pervasive, I think it would be useful to have a 2
level interface here. The standard Malloc() would call
you advanced PlacedMalloc(), and anyone could call that function, but I
think its just cruel to make everyone allocating
memory give arguments they do not understand or need.

  Matt


Barry

Note: it is not clear to me how this could be helpful on its own
 because I agree with Jed how is the user when creating the object supposed
 to know the optimal place to put the object?  For more complex objects it
 may be that different parts of the object would be stored in different
 types of memory etc etc.


   I recently sat with Chris Cantalupo, the main memkind developer, and
 walked him through PETSc's allocation routines, and we came up with the
 following: The imalloc() function pointer could have an implementation
 something like
 
  PetcErrorCode PetscMemkindMalloc(size_t size, const char *func, const
 char *file, void **result)
 
  {
 
  struct memkind *kind;
 
  int err;
 
 
  if (*result == NULL) {
 
  kind = MEMKIND_DEFAULT;
 
  }
 
  else {
 
  kind = (struct memkind *)(*result);
 
  }
 
 
  err = memkind_posix_memalign(kind, result, 16, size);
 
  return PosixErrToPetscErr(err);
 
  }
 
 
  and ifree will look something like:
 
  PetcErrorCode PetscMemkindFree(void *ptr, int a, const char *func, const
 char *file)
 
  {
 
  memkind_free(0, ptr);
 
  return 0;
 
  }
 
 
  This gives us (1) a method of passing the kind of memory without
 modifying the petsc allocation routine calling sequence, and (2) support a
 fall back code path legacy applications which will not set the pointer to
 NULL.  Or am I missing something?
 
  Thoughts?  I'd like to hash out something soon and start writing some
 code.
 
  --Richard




-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener