Just cc both lists so that you don't have to repeat the email and we can get everyone's feedback.
On Mar 22, 2011, at 8:44 AM, Bela Ban wrote: > I cross-posted this to the JGroups mailing lists [1] > > > [1] https://sourceforge.net/mail/?group_id=6081 > > > On 3/22/11 2:05 AM, Dave wrote: >> I switched back to UDP today based on your feedback. Our config resembles >> the config below. Like I said we just increased sizes and timeouts. If you >> ask me why I tweaked a certain parameter my response would be that it seemed >> like a good idea based on the JGroups documentation. UDP seemed a little >> more problematic than TCP, not sure why though. >> >> <config xmlns="urn:org:jgroups" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd"> >> <UDP >> mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}" >> mcast_port="${jgroups.udp.mcast_port:46655}" >> tos="8" >> ucast_recv_buf_size="20000000" >> ucast_send_buf_size="640000" >> mcast_recv_buf_size="25000000" >> mcast_send_buf_size="640000" >> loopback="true" >> discard_incompatible_packets="true" >> max_bundle_size="4000000" >> max_bundle_timeout="30" >> ip_ttl="${jgroups.udp.ip_ttl:2}" >> enable_bundling="true" >> enable_diagnostics="false" >> >> thread_naming_pattern="pl" >> >> thread_pool.enabled="true" >> thread_pool.min_threads="2" >> thread_pool.max_threads="30" >> thread_pool.keep_alive_time="5000" >> thread_pool.queue_enabled="true" >> thread_pool.queue_max_size="1000" >> thread_pool.rejection_policy="Discard" >> >> oob_thread_pool.enabled="true" >> oob_thread_pool.min_threads="2" >> oob_thread_pool.max_threads="30" >> oob_thread_pool.keep_alive_time="5000" >> oob_thread_pool.queue_enabled="true" >> oob_thread_pool.queue_max_size="1000" >> oob_thread_pool.rejection_policy="Discard" >> /> >> >> <PING timeout="360000" num_initial_members="400" >> break_on_coord_rsp="false"/> >> <MERGE2 max_interval="30000" min_interval="10000"/> >> <FD_SOCK/> >> <FD_ALL/> >> <BARRIER /> >> <pbcast.NAKACK use_stats_for_retransmission="false" >> exponential_backoff="0" >> use_mcast_xmit="true" gc_lag="0" >> retransmit_timeout="300,600,1200,2400,3600,4800" >> discard_delivered_msgs="true"/> >> <UNICAST timeout="300,600,1200,2400,3600,4800"/> >> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" >> max_bytes="1000000"/> >> <pbcast.GMS print_local_addr="false" join_timeout="60000" >> view_bundling="true" use_flush_if_present="false"/> >> <UFC max_credits="2000000" min_threshold="0.20"/> >> <MFC max_credits="2000000" min_threshold="0.20"/> >> <FRAG2 frag_size="2000000" /> >> <pbcast.STREAMING_STATE_TRANSFER/> >> <!--<pbcast.STATE_TRANSFER/> --> >> <pbcast.FLUSH timeout="0"/> >> </config> >> >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Bela Ban >> Sent: Saturday, March 19, 2011 1:15 PM >> To: [email protected] >> Subject: Re: [infinispan-dev] Infinispan Large Scale support >> >> Hard to believe that TCP would be better, as TCP creates a mesh of >> connections; for 400 nodes, with every node sending, you'll have roughly >> 400*400 connections ! >> >> I always had a much better experience with UDP >> >> On 3/19/11 2:37 PM, david marion wrote: >>> >>> Initially yes, but I think we are getting better stability using TCP. I >> switched it back to TCP yesterday. I can post specifics of what I did in the >> TCP configuration, but the short story is I increased a lot of the timeout >> values to get it to work. >>> >>> Dave Marion >>> >>> >>>> Date: Sat, 19 Mar 2011 10:50:54 +0100 >>>> From: [email protected] >>>> To: [email protected] >>>> Subject: Re: [infinispan-dev] Infinispan Large Scale support >>>> >>>> >>>> >>>> On 3/18/11 10:35 PM, Dave wrote: >>>>> Won't be able to get CR4 uploaded, policy dictates that I wait until >> final >>>>> release. However, I was able to get 431 nodes up and running as a >> replicated >>>>> cluster and 115 nodes up as a distributed cluster. For the 430 node >> cache, I >>>>> was able to get it started with no problems about 50% of the time. When >> they >>>>> formed multiple clusters they merged together only some of the time. It >>>>> really does appear to be a startup issue at this point. We have not >> pushed >>>>> it hard enough yet to see what happens at this scale under load. >>>>> >>>>> >>>>> >>>>> Any idea when CR4 will be FINAL? >>>>> >>>>> Are there any tools to help diagnose problems / performance at this >> scale (I >>>>> ended up writing my own monitor program)? >>>> >>>> >>>> Yes, there's probe.sh at the JGroups level. I created a JIRA to provide >>>> a sample for large clusters. You said you based your config on udp.xml, >>>> correct ? >>>> >>>> [1] https://issues.jboss.org/browse/JGRP-1307 >>>> >>>> -- >>>> Bela Ban >>>> Lead JGroups / Clustering Team >>>> JBoss >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > -- > Bela Ban > Lead JGroups / Clustering Team > JBoss > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
