Thanks for nice report! I'll look at it tomorrow. On Thu, Feb 17, 2011 at 12:00 AM, Paweł Brach <[email protected]> wrote: > Unfortunately there are still some problems with communications. > I didn't get any error likes connection loss exception, but I'm sending > message with tag: > byte[] tagName = Bytes.toBytes("TEST_TAG"); > and once (only!) during my experiment I received something like: > String msgTag = Bytes.toString(received.getTag()); > // msgTag = "[B@56c163f" > > It looks like sometimes messages are corrupted. > > Cheers, > Pawel > > PS. It could be great to see your benchmark results. > > 2011/2/16 Edward J. Yoon <[email protected]> > >> I decided to add a "random communication benchmark" tool. In this week >> (or next week), I'll share with you my benchmarking experience. I have >> 20 (160 cores) servers. >> >> Thanks. >> >> 2011/2/16 Edward J. Yoon <[email protected]>: >> > Looks like problem of sync. Can you try again it after add >> Thread.sleep(100); line? >> > >> > Sent from my iPhone >> > >> > On 2011. 2. 16., at 오후 3:24, Paweł Brach <[email protected]> wrote: >> > >> >> Yes, I have of course. My cluster has been configured and both examples >> >> PiEstimator and SerializePrinting work (there is communication between 3 >> >> nodes). I've modified your example - PiEstimator (put everything in the >> >> loop) and it works for few iterations (there is communication) and after >> >> that connection is lost. After that connection is re-established but >> some >> >> messages are missing. It looks like that Hama framework is very unstable >> >> when it's loaded and many messages are sending between nodes. >> >> On the same cluster I've configured Apache Hadoop and it's very stable. >> >> If you have own cluster configured, could you run my example on it ? >> Have >> >> you ever run something more complicated than PiEstimator and >> >> SerializePrinting on it ? >> >> >> >> Cheers, >> >> Pawel >> >> >> >> 2011/2/16 Chia-Hung Lin <[email protected]> >> >> >> >>> Have you configured zookeeper in hama-site.xml? Hama makes use of >> >>> zookeeper to do node communication IIRC. >> >>> >> >>> Opening socket connection to server cl5/127.0.1.1:2181 >> >>> >> >>> indicates that seems only localhost is up. If this is the case, you >> >>> can change hama.zookeeper.quorum property pointing with value set to >> >>> e.g. >> >>> >> >>> <property> >> >>> <name>hama.zookeeper.quorum</name> >> >>> <value>node1,node2,node3,node4,node5</value> >> >>> </property> >> >>> >> >>> Hope it helps >> >>> >> >>> 2011/2/15 Paweł Brach <[email protected]>: >> >>>> Hello, >> >>>> >> >>>> During last few days I've tested Hama solutions and today I found some >> >>>> strange error in Hama framework. If you run a simple job with more >> than >> >>> few >> >>>> supersteps the following error occures: >> >>>> >> >>>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer: >> >>>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening >> >>> socket >> >>>> connection to server cl5/127.0.1.1:2181 >> >>>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session >> 0x0 >> >>>> for server null, unexpected error, closing socket connection and >> >>> attempting >> >>>> reconnect >> >>>> java.net.ConnectException: Connection refused >> >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> >>>> at >> >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) >> >>>> at >> >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) >> >>>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer: >> >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >> >>>> KeeperErrorCode = ConnectionLoss for /bsp >> >>>> >> >>>> You can reproduce that by running PiEstimator (the newest source code >> >>> from >> >>>> svn) with small changes - put whole body of the bsp() method in the >> for >> >>>> loop. So add in the beginning following line: >> >>>> >> >>>> for (int j = 0; j < 100; j++) { >> >>>> // oryginal bsp() code >> >>>> } >> >>>> >> >>>> When I'm trying to run it, the framowork hangs and mentioned before >> error >> >>>> occures. >> >>>> >> >>>> Your help will be appreciated. >> >>>> >> >>>> Cheers, >> >>>> >> >>>> -- >> >>>> Pawel Brach >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> ChiaHung Lin @ nuk, tw. >> >>> >> >> >> >> >> >> >> >> -- >> >> Paweł Brach >> > >> >> >> >> -- >> Best Regards, Edward J. Yoon >> http://blog.udanax.org >> http://twitter.com/eddieyoon >> > > > > -- > Paweł Brach >
-- Best Regards, Edward J. Yoon http://blog.udanax.org http://twitter.com/eddieyoon
