On Wed, Aug 11, 2010 at 04:08:08PM +0200, Olaf Till wrote: > On Wed, Aug 11, 2010 at 05:13:22PM +0530, Vipul Agrawal wrote: > > On Wed, Aug 11, 2010 at 4:45 PM, Olaf Till <[email protected]> wrote: > > > > > On Wed, Aug 11, 2010 at 11:51:43AM +0530, Vipul Agrawal wrote: > > > > >On Sat, Aug 07, 2010 at 05:31:24PM +1100, Lutaev D. A. wrote: > > > > >> We used parallel-2.0.2 and we have problems with such code: > > > > >> > > > > >> clear; > > > > >> > > > > >> hosts = []; > > > > >> > > > > >> for i = 1:nargin > > > > >> hosts = [hosts; argv(){i, 1}]; > > > > >> end > > > > >> > > > > >> hosts > > > > >> > > > > >> sockets = connect(hosts) > > > > >> > > > > >> x = rand(50, 1000); > > > > >> > > > > >> send(x, sockets(2, :)); > > > > >> reval("x = recv(sockets(1, :))", sockets(2, :)); > > > > >> scloseall(sockets); > > > > >> > > > > >> Programm stucks when it's trying to send x from sockets(1, :) > > > > >> (master) > > > to > > > > >> slave (sockets(2, :)). > > > > > > > > > >As I said, I'm unable to reproduce the problem. Maybe it won't help, > > > > >but why don't you send a real session transcript (cut-and-paste from > > > > >your terminal running Octave) and indicate exactly the command which > > > > >"stucks"? Commands which you only intended to give are of no use to > > > > >me. Since I have no notion as yet what the cause of the problem is, > > > > >the contents of the variables "host" and "sockets" may be important; > > > > >why don't you show it? Of corse you should hide the real hostnames, > > > > >but I have to see whether they are different, whether the local > > > > >machine is among the servers, and what is the length of the hostnames. > > > > > > > > > >Do you use Octave-3.2.4 and parallel-2.0.2 on _all_ machines? > > > > > > > > > >What you still can do is to check whether the server process and child > > > > >process are running before and after the "stucking" command (on each > > > > >server machine: ps ax | grep octave and post the output (replacing > > > > >hostnames, of corse)). > > > > > > > > > >Olaf > > > > > > > > > > > > > I am using octave-3.2.4 from maverick repo and parallel-2.0.2 build from > > > > source. I am also getting the same issue with big matrices. > > > > I could not send more than 32767 elements(2^15-1) of type double(size 8 > > > > bytes) = 262136 > > > > The reason maybe be incorrect buffer size. > > > > the bufsize in pserver.cc in line 507: > > > > int bufsize = 262144; > > > > A possible solution is to change to > > > > int bufsize = BUFF_SIZE; > > > > > > > > Now, the no. of elements increases to about 46k which interestingly > > > > comes > > > > out to be a magic number equal to 2^15 * sqrt(2). Quite Amazing! > > > > I think there is still some other issue which stalls sending matrices > > > larger > > > > than this size. > > > > > > > > -Vipul > > > > > > "send" does not return until the whole value is written to the > > > socket. If the values length exceeds the sockets buffer size, a > > > process at the far end of the connection must read data for "send" > > > being able to return. So before "send" in the master process, one must > > > first start "recv" on the other end, e.g.: > > > > > > octave:13> reval ("send (recv (sockets(1, :)), sockets(1, :))", sockets(2, > > > :)) > > > octave:14> send (ones (100, 1000000), sockets(2, :)) > > > octave:15> size (recv (sockets(2, :))) > > > ans = > > > > > > 100 1000000 > > > > > > octave:16> > > > > > > I don't know why the sockets buffersize for outgoing connections has > > > been set to a lower value than for incoming connections in pserver.cc; > > > this probably should be corrected, since BUFF_SIZE (the higher value) > > > is considered by send.cc. But this should not be essential (only a > > > matter of efficiency). > > > > > > Thanks for the report. > > > > > > Olaf > > > > > > Hi Olaf, > > Your trick works fine. Now, I am able to send and receive large matrices. > > But it seems that the data becomes corrupted between transfer. > > For example: > > hosts = ['host1';'host2';'host3']; > > sockets = connqect(hosts) > > > > sockets = > > > > 0 0 0 > > 13 11 1234 > > 14 12 1234 > > > > a = rand(50); > > reval("temp = recv(sockets(1,:));",sockets(2,:)); > > send(a,sockets(2,:)); > > reval("send(temp,sockets(1,:))",sockets(2,:)); > > b = recv(sockets(2,:)); > > isequal(a,b) > > > > ans = 0 > > > > octave:> a - b > > columns 1-7 are all zeros. columns 9-50 are all non-zero. > > Columns 6 through 10: > > > > 0.0000e+00 0.0000e+00 0.0000e+00 6.1764e+21 2.0570e-01 > > 0.0000e+00 0.0000e+00 0.0000e+00 3.4265e-02 -2.3695e+35 > > 0.0000e+00 0.0000e+00 0.0000e+00 2.0374e+20 -1.0870e+267 > > 0.0000e+00 0.0000e+00 0.0000e+00 9.0336e-01 -2.9740e+207 > > 0.0000e+00 0.0000e+00 0.0000e+00 5.2625e-01 2.1413e-01 > > 0.0000e+00 0.0000e+00 0.0000e+00 8.7713e-01 -1.8943e+280 > > 0.0000e+00 0.0000e+00 0.0000e+00 -2.1483e+70 9.9818e+50 > > 0.0000e+00 0.0000e+00 0.0000e+00 7.4401e-01 6.9315e+09 > > 0.0000e+00 0.0000e+00 0.0000e+00 5.0168e-01 7.7556e-01 > > 0.0000e+00 0.0000e+00 0.0000e+00 -1.1288e+161 1.0402e-01 > > 0.0000e+00 0.0000e+00 -4.1461e+32 8.0617e-01 1.1567e+134 > > 0.0000e+00 0.0000e+00 1.7965e+251 -2.4074e+254 3.1141e+147 > > 0.0000e+00 0.0000e+00 1.5704e+79 4.5556e+69 8.1072e-01 > > 0.0000e+00 0.0000e+00 -6.7451e+37 1.1328e-01 1.0592e+83 > > 0.0000e+00 0.0000e+00 7.5392e-01 -4.6222e+83 3.1915e-01 > > ..... > > > > Does anybody else has reproduced similar problem? > > Not me --- for me, "a" and "b" are identical. > > Several thoughts: > > - While Octave, when saving and loading data, is AFAIK supposed to > care for endian-ness and possible differences in floating point > format, "send" and "recv" do not use the saving and loading > functionality of Octave, and only consider endian-ness. > > - It is difficult to rewrite "send" and "recv" to use the above > functionality of Octave, since the latter is based on Octaves stream > ids, which know nothing of the "externally" allocated sockets.
Luckily, it was not so difficult. "send" and "recv" have been rewritten now, the new package (2.0.3) is submitted for release. Data exchange is now probably less error-prone. Could you try if the new package solves your problem? Olaf ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
