On Wed, Aug 11, 2010 at 05:13:22PM +0530, Vipul Agrawal wrote: > On Wed, Aug 11, 2010 at 4:45 PM, Olaf Till <[email protected]> wrote: > > > On Wed, Aug 11, 2010 at 11:51:43AM +0530, Vipul Agrawal wrote: > > > >On Sat, Aug 07, 2010 at 05:31:24PM +1100, Lutaev D. A. wrote: > > > >> We used parallel-2.0.2 and we have problems with such code: > > > >> > > > >> clear; > > > >> > > > >> hosts = []; > > > >> > > > >> for i = 1:nargin > > > >> hosts = [hosts; argv(){i, 1}]; > > > >> end > > > >> > > > >> hosts > > > >> > > > >> sockets = connect(hosts) > > > >> > > > >> x = rand(50, 1000); > > > >> > > > >> send(x, sockets(2, :)); > > > >> reval("x = recv(sockets(1, :))", sockets(2, :)); > > > >> scloseall(sockets); > > > >> > > > >> Programm stucks when it's trying to send x from sockets(1, :) (master) > > to > > > >> slave (sockets(2, :)). > > > > > > > >As I said, I'm unable to reproduce the problem. Maybe it won't help, > > > >but why don't you send a real session transcript (cut-and-paste from > > > >your terminal running Octave) and indicate exactly the command which > > > >"stucks"? Commands which you only intended to give are of no use to > > > >me. Since I have no notion as yet what the cause of the problem is, > > > >the contents of the variables "host" and "sockets" may be important; > > > >why don't you show it? Of corse you should hide the real hostnames, > > > >but I have to see whether they are different, whether the local > > > >machine is among the servers, and what is the length of the hostnames. > > > > > > > >Do you use Octave-3.2.4 and parallel-2.0.2 on _all_ machines? > > > > > > > >What you still can do is to check whether the server process and child > > > >process are running before and after the "stucking" command (on each > > > >server machine: ps ax | grep octave and post the output (replacing > > > >hostnames, of corse)). > > > > > > > >Olaf > > > > > > > > > > I am using octave-3.2.4 from maverick repo and parallel-2.0.2 build from > > > source. I am also getting the same issue with big matrices. > > > I could not send more than 32767 elements(2^15-1) of type double(size 8 > > > bytes) = 262136 > > > The reason maybe be incorrect buffer size. > > > the bufsize in pserver.cc in line 507: > > > int bufsize = 262144; > > > A possible solution is to change to > > > int bufsize = BUFF_SIZE; > > > > > > Now, the no. of elements increases to about 46k which interestingly comes > > > out to be a magic number equal to 2^15 * sqrt(2). Quite Amazing! > > > I think there is still some other issue which stalls sending matrices > > larger > > > than this size. > > > > > > -Vipul > > > > "send" does not return until the whole value is written to the > > socket. If the values length exceeds the sockets buffer size, a > > process at the far end of the connection must read data for "send" > > being able to return. So before "send" in the master process, one must > > first start "recv" on the other end, e.g.: > > > > octave:13> reval ("send (recv (sockets(1, :)), sockets(1, :))", sockets(2, > > :)) > > octave:14> send (ones (100, 1000000), sockets(2, :)) > > octave:15> size (recv (sockets(2, :))) > > ans = > > > > 100 1000000 > > > > octave:16> > > > > I don't know why the sockets buffersize for outgoing connections has > > been set to a lower value than for incoming connections in pserver.cc; > > this probably should be corrected, since BUFF_SIZE (the higher value) > > is considered by send.cc. But this should not be essential (only a > > matter of efficiency). > > > > Thanks for the report. > > > > Olaf > > > > Hi Olaf, > Your trick works fine. Now, I am able to send and receive large matrices. > But it seems that the data becomes corrupted between transfer. > For example: > hosts = ['host1';'host2';'host3']; > sockets = connqect(hosts) > > sockets = > > 0 0 0 > 13 11 1234 > 14 12 1234 > > a = rand(50); > reval("temp = recv(sockets(1,:));",sockets(2,:)); > send(a,sockets(2,:)); > reval("send(temp,sockets(1,:))",sockets(2,:)); > b = recv(sockets(2,:)); > isequal(a,b) > > ans = 0 > > octave:> a - b > columns 1-7 are all zeros. columns 9-50 are all non-zero. > Columns 6 through 10: > > 0.0000e+00 0.0000e+00 0.0000e+00 6.1764e+21 2.0570e-01 > 0.0000e+00 0.0000e+00 0.0000e+00 3.4265e-02 -2.3695e+35 > 0.0000e+00 0.0000e+00 0.0000e+00 2.0374e+20 -1.0870e+267 > 0.0000e+00 0.0000e+00 0.0000e+00 9.0336e-01 -2.9740e+207 > 0.0000e+00 0.0000e+00 0.0000e+00 5.2625e-01 2.1413e-01 > 0.0000e+00 0.0000e+00 0.0000e+00 8.7713e-01 -1.8943e+280 > 0.0000e+00 0.0000e+00 0.0000e+00 -2.1483e+70 9.9818e+50 > 0.0000e+00 0.0000e+00 0.0000e+00 7.4401e-01 6.9315e+09 > 0.0000e+00 0.0000e+00 0.0000e+00 5.0168e-01 7.7556e-01 > 0.0000e+00 0.0000e+00 0.0000e+00 -1.1288e+161 1.0402e-01 > 0.0000e+00 0.0000e+00 -4.1461e+32 8.0617e-01 1.1567e+134 > 0.0000e+00 0.0000e+00 1.7965e+251 -2.4074e+254 3.1141e+147 > 0.0000e+00 0.0000e+00 1.5704e+79 4.5556e+69 8.1072e-01 > 0.0000e+00 0.0000e+00 -6.7451e+37 1.1328e-01 1.0592e+83 > 0.0000e+00 0.0000e+00 7.5392e-01 -4.6222e+83 3.1915e-01 > ..... > > Does anybody else has reproduced similar problem?
Not me --- for me, "a" and "b" are identical. Several thoughts: - While Octave, when saving and loading data, is AFAIK supposed to care for endian-ness and possible differences in floating point format, "send" and "recv" do not use the saving and loading functionality of Octave, and only consider endian-ness. - It is difficult to rewrite "send" and "recv" to use the above functionality of Octave, since the latter is based on Octaves stream ids, which know nothing of the "externally" allocated sockets. - However, if the first issue is the reason for the corruption, I wander why the first columns of "a - b" should be zero. - Has your slave machine a different architecture than the master? If yes, you could test the above with hosts = ["localhost"; "localhost"]; or, if possible, with a slave machine of architecture identical to master. Olaf ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
