[
https://issues.apache.org/jira/browse/CONNECTORS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239576#comment-14239576
]
Karl Wright commented on CONNECTORS-1118:
-----------------------------------------
Hi Aeham,
Both trunk and dev_1x branches already should have a fix for the second problem:
{code}
rt.clearPreloadRequests();
for (int j = 0; j < docidHashes.length; j++)
{
DocumentReference dr = set.get(j);
docidHashes[j] = dr.getLocalIdentifierHash();
docids[j] = dr.getLocalIdentifier();
dataNames[j] = dr.getDataNames();
dataValues[j] = dr.getDataValues();
eventNames[j] = dr.getPrerequisiteEventNames();
// Calculate desired document priority based on current queuetracker
status.
String[] bins =
ManifoldCF.calculateBins(connector,dr.getLocalIdentifier());
PriorityCalculator p = new PriorityCalculator(rt,connection,bins);
priorities[j] = p;
p.makePreloadRequest();
}
rt.preloadBinValues();
{code}
As for the first issue, it is not trivial to fix without changing the entire
IIncrementalIngester API. I'll have to consider how best to deal with that.
> Documents processed by the shared drive connector incur an unnecessary
> synchronisation hit
> ------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1118
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Framework core
> Affects Versions: ManifoldCF 1.7.2
> Reporter: Aeham Abushwashi
>
> Each document processed by the shared drive connector is passed through
> SharedDriveConnector#checkInclude to verify whether the document is eligible
> for ingestion. The calls made here to
> WorkerThread$ProcessActivity#checkMimeTypeIndexable and
> WorkerThread$ProcessActivity#checkLengthIndexable are unnecessarily costly as
> they each create a fresh instance of IncrementalIngester$PipelineConnections
> on every call. The constructor of IncrementalIngester$PipelineConnections can
> be very expensive due to the loading of output connection objects, which in
> turn requires some locking (via ZK - in a distrubuted environment).
> The other area of inefficiency is in
> WorkerThread$ProcessActivity#processDocumentReferences. This method creates
> new instances of PriorityCalculator using the less-efficient 3-arg
> constructor. This can be addressed using the same pattern implemented for
> CONNECTORS-1094
> To highlight the impact of the above calls, I profiled an active worker
> thread for 40 minutes. During that window, it spent ~23 minutes in
> SharedDriveConnector#checkInclude and its callees + 9 minutes creating
> instances of PriorityCalculator.
> I've seen the above issues when using the shared drive connector but I think
> other connectors too could be impacted - depending on how they're implemented.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)