[
https://issues.apache.org/jira/browse/JCRVLT-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joerg Hoh resolved JCRVLT-810.
------------------------------
Resolution: Won't Fix
I won't pursue this approach right now, as it can be a bit risky to skip the
checks.
> Checking workspacefilter slowing down packaging import
> ------------------------------------------------------
>
> Key: JCRVLT-810
> URL: https://issues.apache.org/jira/browse/JCRVLT-810
> Project: Jackrabbit FileVault
> Issue Type: Task
> Affects Versions: 4.0.0
> Reporter: Joerg Hoh
> Assignee: Joerg Hoh
> Priority: Major
>
> I am investigating how to improve the performance of importing content
> packages in AEM, which contain about 60 individual pages with ~ 1500 nodes
> each (and ~ 14k properties in these 1500 nodes). These content packages are
> created by filevault and imported by filevault. The filter.xml looks like
> this:
> {noformat}
> <filter root="/content/fooo/bar">
> <include pattern="/\Qcontent/foo/bar\E"/>
> <include pattern="\Q/content/foo/bar\E/.*"/>
> <exclude pattern=".*rep:policy"/>
> <exclude pattern=".*rep:repoPolicy"/>
> <exclude pattern="^.*/cq:lastReplicated.*" matchProperties="true"/>
> <exclude pattern="^.*/cq:lastReplicatedBy.*" matchProperties="true"/>
> <exclude pattern="^.*/cq:lastReplicationAction.*"
> matchProperties="true"/>
> <exclude pattern="^.*/cq:isDelivered.*" matchProperties="true"/>
> <exclude pattern="^.*/jcr:isCheckedOut.*" matchProperties="true"/>
> <exclude pattern="^.*/jcr:baseVersion.*" matchProperties="true"/>
> <exclude pattern="^.*/jcr:predecessors.*" matchProperties="true"/>
> <exclude pattern="^.*/jcr:versionHistory.*" matchProperties="true"/>
> <exclude pattern="^.*/jcr:activity.*" matchProperties="true"/>
> <exclude pattern="^.*/jcr:configuration.*" matchProperties="true"/>
> </filter>
> [the same structure for the other 59 pages in this package]
> {noformat}
> During this investigation I already applied a series of improvements (not all
> yet reported, and not all yet committed), and I was able to bring the time
> down to 76 seconds.
> Now I see a lot of situations where the stack looks like this:
> {noformat}
> at
> java.util.regex.Pattern$CharPropertyGreedy.match([email protected]/Pattern.java:4461)
> at
> java.util.regex.Pattern$Begin.match([email protected]/Pattern.java:3851)
> at java.util.regex.Matcher.match([email protected]/Matcher.java:1794)
> at java.util.regex.Matcher.matches([email protected]/Matcher.java:754)
> at
> org.apache.jackrabbit.vault.fs.filter.DefaultPathFilter.matches(DefaultPathFilter.java:92)
> at
> org.apache.jackrabbit.vault.fs.api.PathFilterSet.contains(PathFilterSet.java:103)
> at
> org.apache.jackrabbit.vault.fs.config.DefaultWorkspaceFilter.includesProperty(DefaultWorkspaceFilter.java:273)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewImporter.setUnprotectedProperties(DocViewImporter.java:1280)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewImporter.createNewNode(DocViewImporter.java:1182)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewImporter.addNode(DocViewImporter.java:931)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewImporter.startDocViewNode(DocViewImporter.java:410)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXHandler.startElement(DocViewSAXHandler.java:353)
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement([email protected]/AbstractSAXParser.java:518)
> {noformat}
> Here it seems that a lot of time is spent in checking the WorkspaceFilter, if
> the nodes in the content packages are actually covered by the filters and are
> therefor allowed to be imported. Given the specific circumstances in this
> case, this is consistently true.
> To assess the potential impact of an improvement in this case, I shortcut the
> logic in {{DefaultWorkspaceFilter.includesProperty}} and let it return
> consistently {{true}}. With this change in my test I was able to bring down
> the import time of the package to 52s, which is an improvement of more than
> 30%! Which shows that there is a large potential impact in improving this
> logic, but I don't see how this can be improved in a significant way (the
> regex is already compiled), as this means that during this package
> installation we do 14 regex checks per property * 15k properties per page *
> 60 pages = 12.6M regex matches. But getting rid of these checks in the
> general case is also not possible.
> For that reason I am thinking about introducing a new flag to the
> ImportOptions, in which the the code invoking the installation of a package
> can request that these checks are ignored. This information is then passed
> down to the WorkspaceFilter, which can take that into consideration and ignore
> Which in my specific is possible, as I control both the creation of the
> packages, the transport to the consumer's side and also the code triggering
> the import the package.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)