[
https://issues.apache.org/jira/browse/CONNECTORS-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097648#comment-14097648
]
Karl Wright edited comment on CONNECTORS-1009 at 8/14/14 9:05 PM:
------------------------------------------------------------------
Unless you have a means in CMIS of obtaining a documentID, rather than a
version ID, there is no solution to your problem, and the CMIS connector is
operating the best it can.
As for why the document is fetched repeatedly -- we should figure this out. My
guess is that the Alfresco implementation of CMIS doesn't appear to support
versioning:
{code}
if(StringUtils.isNotEmpty(document.getVersionLabel())){
rval[i] = document.getVersionLabel() + ":" + cmisQuery;
} else {
//a CMIS document that doesn't contain versioning information will
always be processed
rval[i] = StringUtils.EMPTY;
}
{code}
Can you find out whether this clause fires? If so, then the CMIS connector
will always refetch alfresco documents.
was (Author: [email protected]):
Unless you have a means in CMIS of obtaining a documentID, rather than a
version ID, there is no solution to your problem, and the CMIS connector is
operating the best it can.
As for why the document is fetched repeatedly -- we should figure this out.
> Cmis Repository Connector does not handle Document updating properly
> --------------------------------------------------------------------
>
> Key: CONNECTORS-1009
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1009
> Project: ManifoldCF
> Issue Type: Bug
> Components: CMIS connector
> Reporter: Prasad Perera
> Priority: Minor
>
> As a part of the Fix for CONNECTORS-1004, It seems CmisRepositoryConnector
> does not handle document updating properly.
> Case Scenario:
> * Create a continuous crawling job using CmisRepositoryConnector.
> * Update a document on repository end.
> * The document keep submitting to OutputConnector at each crawling interval
> though it was not updated afterwards.
> One possible Fix needed I is : @ CmisRepositoryConnector:processDocument,
> activities.ingestDocumentWithException(nodeId, version, documentURI, rd);
> The documentURI should point to the old document URI (Now it points to the
> latest documentURI discovered and it may seems to confuse document references
> ?)
> Also, In ECM systems, for example in Alfresco, the documentIDs are formulated
> with the version number as well.
> Ex: workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.0 -->
> version 1.0
> workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.1 --> version
> 1.1
> When we setup a query to crawl a repository folder, we discover content by
> referring the child nodes. Because of that, now it seems to queue all the
> document versions and submit them to OutputConnector thus producing duplicate
> documents at the output (search) side.
> Is there a way to avoid this problem ? It will be great if the repository can
> just take the latest document version and submit it as an update.
--
This message was sent by Atlassian JIRA
(v6.2#6252)