Issue Type: Bug Bug
Assignee: Kevin Bell
Attachments: graph.png
Components: multiple-scms
Created: 13/Jun/12 5:55 PM
Description:

I was investigating why my jenkins.log file is 13GB (even after rotating daily, and it's only halfway through the day currently), and noticed that the majority of the file (so far I've only managed to confirm up to the first 1 million lines of the file) comes from an exception emitted by the Multiple-SCMs plugin:

org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanData(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at org.jenkinsci.plugins.multiplescms.MultiSCMChangeLogParser.parse(MultiSCMChangeLogParser.java:140)
	at org.jenkinsci.plugins.multiplescms.MultiSCMChangeLogParser$LogSplitter.endElement(MultiSCMChangeLogParser.java:107)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at org.jenkinsci.plugins.multiplescms.MultiSCMChangeLogParser.parse(MultiSCMChangeLogParser.java:140)
	at hudson.model.AbstractBuild.calcChangeSet(AbstractBuild.java:826)
	at hudson.model.AbstractBuild.getChangeSet(AbstractBuild.java:800)
	at hudson.model.AbstractBuild.hasParticipant(AbstractBuild.java:362)
	at hudson.model.AbstractProject.hasParticipant(AbstractProject.java:1389)
	at hudson.model.User.getProjects(User.java:409)
	at hudson.scm.MailAddressResolverImpl.findMailAddressFor(MailAddressResolverImpl.java:21)
	at hudson.tasks.MailAddressResolver.resolve(MailAddressResolver.java:100)
	at hudson.tasks.Mailer$UserProperty.getAddress(Mailer.java:514)
	at org.jenkinsci.plugins.gravatar.UserGravatarResolver.findAvatarFor(UserGravatarResolver.java:67)
	at org.jenkinsci.plugins.gravatar.GravatarCheckAsyncPeriodicWork.execute(GravatarCheckAsyncPeriodicWork.java:56)
	at hudson.model.AsyncPeriodicWork$1.run(AsyncPeriodicWork.java:53)
	at java.lang.Thread.run(Thread.java:636)

The way I confirmed up to the first 1 million lines is using head/tail:

# head /var/log/jenkins/jenkins.log -n 1000000 | tail
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanData(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)

I'm trying to use wc -l to figure out exactly how many lines are in the file to begin with, and after several minutes of waiting, I see the current log file is 160 million lines and 13 GB.

On further checks, I'm not confirming that up to 50 million lines is the same exception:

# head /var/log/jenkins/jenkins.log -n 50000000 | tail
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(Unknown Source)
	at org.jenkinsci.plugins.multiplescms.MultiSCMChangeLogParser.parse(MultiSCMChangeLogParser.java:140)
	at org.jenkinsci.plugins.multiplescms.MultiSCMChangeLogParser$LogSplitter.endElement(MultiSCMChangeLogParser.java:107)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)

And just for kicks, I checked the end of the first 100 million lines... Yup, you guessed it:

# head /var/log/jenkins/jenkins.log -n 100000000 | tail
	at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanData(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

Here is an example of one of my builds' changelog.xml file (I've removed most of the internal information). The SCM polling we use is GitSCM:

<multi-scm-log>
<sub-log scm="hudson.plugins.git.GitSCM">
<![CDATA[Changes in branch origin/HEAD, between f2e9e5b564641405d5eccabd2f60fd74d05eb10f and f2e9e5b564641405d5eccabd2f60fd74d05eb10f
Changes in branch origin/master, between f2e9e5b564641405d5eccabd2f60fd74d05eb10f and f2e9e5b564641405d5eccabd2f60fd74d05eb10f
]]>
</sub-log>
<sub-log scm="hudson.plugins.git.GitSCM">
<![CDATA[Changes in branch base/master, between dc78f64a3ae4a006aacd83e6d247ef9770c910bd and dc78f64a3ae4a006aacd83e6d247ef9770c910bd
]]>
</sub-log>
<sub-log scm="hudson.plugins.git.GitSCM">
<![CDATA[Changes in branch common/master, between f1df64664f9a4573a812a307e22e365010ce6d02 and c6c5cb40bbc2f1dda11c8c47756adcb28b41961e
commit c6c5cb40bbc2f1dda11c8c47756adcb28b41961e
tree 83e64f5dda4e20d664e02afc5bf7fa04f5cbcf98
parent f1df64664f9a4573a812a307e22e365010ce6d02
author XXXX 1339444234 -0400
committer XXXX 1339604443 -0400

    XXXX
    
    Change-Id: I35edfcb6b5dee97c8eed0b17218b8f440917c11c
    Reviewed-on: http://gerrit.xxx

:100644 100644 0c5996cf2cafb64fd4643ce8cb7babc0b71d557b cefd6dbfbd9634a2784c42ec87119b3977c15af0 M	Path/Files..
:100644 100644 c46e7c64a04520d4143a257cb826cc18de306610 eb254e580056a7b2b67e315dc7fe90f55f661c76 M	Path/Files..
:100644 100644 66f985386cf7c3b9df445cfe6d5fc94a0ddd295e 33e4c22951a6f61034b7bf1ad9fd49f79a45daee M	Path/Files..
...
]]>
</sub-log>
</multi-scm-log>

The sheer size of this log file causes logrotate to flush all kernel cache buffers while it has to compress the log file, and causes the memory consumption pattern to look like the attached graph.

Project: Jenkins
Priority: Major Major
Reporter: Joe Hansche
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to