On Wed, Aug 11, 2010 at 4:45 PM, Olaf Till <[email protected]> wrote:

> On Wed, Aug 11, 2010 at 11:51:43AM +0530, Vipul Agrawal wrote:
> > >On Sat, Aug 07, 2010 at 05:31:24PM +1100, Lutaev D. A. wrote:
> > >> We used parallel-2.0.2 and we have problems with such code:
> > >>
> > >> clear;
> > >>
> > >> hosts = [];
> > >>
> > >> for i = 1:nargin
> > >>         hosts = [hosts; argv(){i, 1}];
> > >> end
> > >>
> > >> hosts
> > >>
> > >> sockets = connect(hosts)
> > >>
> > >> x = rand(50, 1000);
> > >>
> > >> send(x, sockets(2, :));
> > >> reval("x = recv(sockets(1, :))", sockets(2, :));
> > >> scloseall(sockets);
> > >>
> > >> Programm stucks when it's trying to send x from sockets(1, :) (master)
> to
> > >> slave (sockets(2, :)).
> > >
> > >As I said, I'm unable to reproduce the problem. Maybe it won't help,
> > >but why don't you send a real session transcript (cut-and-paste from
> > >your terminal running Octave) and indicate exactly the command which
> > >"stucks"? Commands which you only intended to give are of no use to
> > >me. Since I have no notion as yet what the cause of the problem is,
> > >the contents of the variables "host" and "sockets" may be important;
> > >why don't you show it? Of corse you should hide the real hostnames,
> > >but I have to see whether they are different, whether the local
> > >machine is among the servers, and what is the length of the hostnames.
> > >
> > >Do you use Octave-3.2.4 and parallel-2.0.2 on _all_ machines?
> > >
> > >What you still can do is to check whether the server process and child
> > >process are running before and after the "stucking" command (on each
> > >server machine: ps ax | grep octave    and post the output (replacing
> > >hostnames, of corse)).
> > >
> > >Olaf
> > >
> >
> > I am using octave-3.2.4 from maverick repo and parallel-2.0.2 build from
> > source. I am also getting the same issue with big matrices.
> > I could not send more than 32767 elements(2^15-1) of type double(size 8
> > bytes) = 262136
> > The reason maybe be incorrect buffer size.
> > the bufsize in pserver.cc in line 507:
> >     int bufsize = 262144;
> > A possible solution is to change to
> >     int bufsize = BUFF_SIZE;
> >
> > Now, the no. of elements increases to about 46k which interestingly comes
> > out to be a magic number equal to 2^15 * sqrt(2). Quite Amazing!
> > I think there is still some other issue which stalls sending matrices
> larger
> > than this size.
> >
> > -Vipul
>
> "send" does not return until the whole value is written to the
> socket. If the values length exceeds the sockets buffer size, a
> process at the far end of the connection must read data for "send"
> being able to return. So before "send" in the master process, one must
> first start "recv" on the other end, e.g.:
>
> octave:13> reval ("send (recv (sockets(1, :)), sockets(1, :))", sockets(2,
> :))
> octave:14> send (ones (100, 1000000), sockets(2, :))
> octave:15> size (recv (sockets(2, :)))
> ans =
>
>       100   1000000
>
> octave:16>
>
> I don't know why the sockets buffersize for outgoing connections has
> been set to a lower value than for incoming connections in pserver.cc;
> this probably should be corrected, since BUFF_SIZE (the higher value)
> is considered by send.cc. But this should not be essential (only a
> matter of efficiency).
>
> Thanks for the report.
>
> Olaf
>
> Hi Olaf,
Your trick works fine. Now, I am able to send and receive large matrices.
But it seems that the data becomes corrupted between transfer.
For example:
hosts = ['host1';'host2';'host3'];
sockets = connqect(hosts)

sockets =

      0      0      0
     13     11   1234
     14     12   1234

a = rand(50);
reval("temp = recv(sockets(1,:));",sockets(2,:));
send(a,sockets(2,:));
reval("send(temp,sockets(1,:))",sockets(2,:));
b = recv(sockets(2,:));
isequal(a,b)

ans = 0

octave:> a - b
columns 1-7 are all zeros. columns 9-50 are all non-zero.
Columns 6 through 10:

    0.0000e+00    0.0000e+00    0.0000e+00    6.1764e+21    2.0570e-01
    0.0000e+00    0.0000e+00    0.0000e+00    3.4265e-02   -2.3695e+35
    0.0000e+00    0.0000e+00    0.0000e+00    2.0374e+20  -1.0870e+267
    0.0000e+00    0.0000e+00    0.0000e+00    9.0336e-01  -2.9740e+207
    0.0000e+00    0.0000e+00    0.0000e+00    5.2625e-01    2.1413e-01
    0.0000e+00    0.0000e+00    0.0000e+00    8.7713e-01  -1.8943e+280
    0.0000e+00    0.0000e+00    0.0000e+00   -2.1483e+70    9.9818e+50
    0.0000e+00    0.0000e+00    0.0000e+00    7.4401e-01    6.9315e+09
    0.0000e+00    0.0000e+00    0.0000e+00    5.0168e-01    7.7556e-01
    0.0000e+00    0.0000e+00    0.0000e+00  -1.1288e+161    1.0402e-01
    0.0000e+00    0.0000e+00   -4.1461e+32    8.0617e-01   1.1567e+134
    0.0000e+00    0.0000e+00   1.7965e+251  -2.4074e+254   3.1141e+147
    0.0000e+00    0.0000e+00    1.5704e+79    4.5556e+69    8.1072e-01
    0.0000e+00    0.0000e+00   -6.7451e+37    1.1328e-01    1.0592e+83
    0.0000e+00    0.0000e+00    7.5392e-01   -4.6222e+83    3.1915e-01
    .....

Does anybody else has reproduced similar problem?

Thanks for cooperation,
-Vipul
------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to