Re: [OctDev] request an octave-forge account, functions for parallel computation in a cluster

Olaf Till Thu, 26 Mar 2009 10:12:55 -0700

On Thu, Mar 26, 2009 at 05:10:45PM +0100, Jaroslav Hajek wrote:
> 2009/3/26 Olaf Till <[email protected]>:
> > On Thu, Mar 26, 2009 at 03:22:59PM +0100, Jaroslav Hajek wrote:
> >> On Thu, Mar 26, 2009 at 3:00 PM, Jaroslav Hajek <[email protected]> wrote:
> >> > 2009/3/26 Olaf Till <[email protected]>:
> >> >> On Thu, Mar 26, 2009 at 09:21:38AM +0100, Olaf Till wrote:
> >> >>>
> >> >>> ...
> >> >>>
> >> >>> Looking at the parcellfun stuff, I noticed that there are still more
> >> >>> functions for sending and receiving Octave variables in octave-forge:
> >> >>> fsave and fload. Some time ago there was a discussion in the
> >> >>> octave-maintainers list on save'ing and load'ing over streams. After
> >> >>> getting some hints, I posted psend and prcv, which can install/read
> >> >>> Octave variables to/from memory as well as directly returning/taking
> >> >>> their contents, and which read/write a binary header also (to care for
> >> >>> endian'ness, as I understood), and which distinguish eof at start of
> >> >>> reading from eof later. The discussion dried out, probably the
> >> >>> solution was not general enough and not obviously the right one for
> >> >>> Octave. For the now contributed code, the functionality of fsave and
> >> >>> fload would probably be sufficient, if only endian'ness was cared for
> >> >>> by writing/reading a header (varibles are sent between different
> >> >>> machines now). If this would be done, I could rewrite my code, and
> >> >>> then there is probably no reason to keep psend and prcv in the
> >> >>> package.
> >> >>
> >> >> Jaroslav,
> >> >>
> >> >> would you accept the attached small patches for fsave.cc and fload.cc,
> >> >> to make them robust for usage between different machines?
> >> >>
> >> >
> >> > I actually didn't intend fsave and fload to work across machines
> >> > (parcellfun uses them just for pipes). But I'm not against it. I want,
> >> > however, to avoid the overhead of sending the binary header with each
> >> > variable.
> >> >
> >> > I see several options:
> >> > 1. use explicit arch specs (like fwrite, fread)
> >> > 2. use explicit option to control whether headers are used (but then
> >> > you can as well use different functions)
> >> > 3. cache the stream pointer in both fload/fsave and only write the
> >> > header on first save / attempt to read it on first load. This is
> >> > possible because octave_stream has reference counting, ensuring a
> >> > cached object remains valid.
> >> > The price to pay is one dangling closed file (but no memory leaks),
> >> > which is easily acceptable.
> >> >
> >>
> >> Umm, I just realized this is not really that easy if you
> >> simultaneously use more streams. So the question is still open. I
> >> don't see any reasonably simple way to achieve it, other than
> >> maintaining a map of open stream numbers which already received the
> >> header.
> >> Specifying architecture explicitly is another option - or just leaving
> >> everything in the present state.
> >
> > Well, I didn't want to make things more difficult. But I thought there
> > may be more people wanting to save/load to/from streams, and something
> > general for all would be good. The names "fsave" and "fload" sound
> > very general. But maybe this is not so easy and we should use
> > different functions for different purposes.
> >
> 
> Yes, I agree with the motivation, but the price seems too high for
> parcellfun. I really want to avoid putting the binary header in front
> of every variable, because, say, if parcellfun sends mere scalars, it
> will increase the traffic several times. fload is actually a hybrid
> between fread, which requires explicit specification of arch, and
> load, which detects it. Similarly for fsave.
> My best thought right now is to add fwriteheader and freadheader to
> communicate the architecture first (if needed), then use that in
> fload. It would be nice to allow that information be somehow cached
> with the stream number, but I don't see a good way unless we modify
> Octave.


If it can't be cached, its probably really better to use a different
function which writes (or reads) both header and data. In the long run
I think its better to have such a function in main/general. It could
be even more general, allowing installing variables into memory. I
think I will stick to my psend and prcv and leave them with my code
for the present, for internal use, and maybe I can polish them up
later.

> 
> > select.cc: current code is attached.
> >
> 
> OK
> 
> > _exit.cc: also attached, but maybe you expect too much, its only a
> > trivial one-liner.
> 
> Yeah that's expected, I was just lazy to write it myself. But I think
> the function should be named __exit__ for consistency with other
> internal functions, that are intended for hackers' use.

Thats OK, I will change calls of _exit() to __exit__() in my code.

> 
> > functions for advisory locking: I think I postpone rewriting these and
> > for the present include them in the package. Still think, apart from
> > using Octave streams, they should be more general to be a solution for
> > all.
> >
> >
> > The question is still open wether I should make scheduling remote
> > function execution a seperate package or combine it maybe with
> > "parallel".
> 
> Your choice. I agree with Soren; it seems to me that the "parallel"
> package is more or less orphaned (according to SVN log the author
> didn't contribute anything for 3 years), so if your package
> supersedes, at least approximately, the functionality of parallel and
> you're willing to maintain it, that seems to me enough to just replace
> the contents of "parallel".

Actually it does not supersede enough, it is really different. I would
not replace "parallel" before there is something equivalent. But after
your comment I feel now bold enough to put my code into the "parallel"
package alongside with the original code, if its author does not object.

> 
> > Actually I think that "parallel" would also be a suitable
> > place for parcellfun, since users will first look there for such
> > functionality.
> >
> 
> I considered that option, but decided to go with general which I
> maintain, partly because "parallel" seems orphaned.
> Also, parcellfun is more special in the sense that it should work more
> or less as a drop-in replacement for cellfun, out of the box, no
> setup, no manual launching of processes or similar, which, as I
> understand it, is not true for the other packages. But of course its
> applicability is limited to multicore machines, and probably just
> Linux-like.

I would give a hint then in parallel/doc that such a function exists
in the general package.

Thanks for the comments.

Olaf

------------------------------------------------------------------------------
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] request an octave-forge account, functions for parallel computation in a cluster

Reply via email to