For the record, this was caused by a rookie mistake: FD exhaustion.
--Casey
On 8/24/12 11:24 AM, Casey Callendrello wrote:
> Hi there,
> I have been doing some load testing with Solr 4 beta (now, trunk). My
> configuration is fairly simple - two servers, replicating via
> SolrCloud. SolrCloud is configured as recommended in the wiki:
>
>
>
>
>
>
>
> Twice now I've seen sudden thread and file-descriptor spikes along
> with a complete deadlock, simultaneously on both machines. My max FDs
> is set to 1024, and (excepting the spikes) I never see usage over 375
> fds.
>
> The first FD spike was with an older trunk revision. It was
> co-incident with a corrupt transaction log. I've lost the logs,
> unfortunately, but SOLR tried to re-process the same log over and
> over, leaking FDs and dying.
>
> The upgraded version has not reported the corrupt transaction issue
> prior to deadlock. However, according to the log files, the deadlock
> persists for about 5 minutes prior to FD exhaustion. The last log line
> is simply "INFO: end_commit_flush"
>
> Upon restart, I see a frightening amount of corrupt transaction log
> exceptions and " New transaction log already exists" exceptions.
>
> Any thoughts?
> Contact me for the thread dump; it's 1 MiB.
>
> Thanks,
> --Casey C.
signature.asc
Description: OpenPGP digital signature