[jira] [Comment Edited] (CONNECTORS-1009) Cmis Repository Connector does not handle Document updating properly

Karl Wright (JIRA) Thu, 14 Aug 2014 14:06:38 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097648#comment-14097648
 ]


Karl Wright edited comment on CONNECTORS-1009 at 8/14/14 9:05 PM:
------------------------------------------------------------------

Unless you have a means in CMIS of obtaining a documentID, rather than a 
version ID, there is no solution to your problem, and the CMIS connector is 
operating the best it can.

As for why the document is fetched repeatedly -- we should figure this out.  My 
guess is that the Alfresco implementation of CMIS doesn't appear to support 
versioning:

{code}
        if(StringUtils.isNotEmpty(document.getVersionLabel())){
          rval[i] = document.getVersionLabel() + ":" + cmisQuery;
        } else {
        //a CMIS document that doesn't contain versioning information will 
always be processed
          rval[i] = StringUtils.EMPTY;
        }
{code}
Can you find out whether this clause fires?  If so, then the CMIS connector 
will always refetch alfresco documents.



was (Author: [email protected]):
Unless you have a means in CMIS of obtaining a documentID, rather than a 
version ID, there is no solution to your problem, and the CMIS connector is 
operating the best it can.

As for why the document is fetched repeatedly -- we should figure this out.

> Cmis Repository Connector does not handle Document updating properly
> --------------------------------------------------------------------
>
>                 Key: CONNECTORS-1009
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1009
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: CMIS connector
>            Reporter: Prasad Perera
>            Priority: Minor
>
> As a part of the Fix for CONNECTORS-1004, It seems CmisRepositoryConnector 
> does not handle document updating properly.
> Case Scenario:
> * Create a continuous crawling job using  CmisRepositoryConnector.
> * Update a document on repository end.
> * The document keep submitting to OutputConnector at each crawling interval 
> though it was not updated afterwards.
> One possible Fix needed I is : @ CmisRepositoryConnector:processDocument,
>  activities.ingestDocumentWithException(nodeId, version, documentURI, rd);
> The documentURI should point to the old document URI (Now it points to the 
> latest documentURI discovered and it may seems to confuse document references 
> ?)
> Also, In ECM systems, for example in Alfresco, the documentIDs are formulated 
> with the version number as well.
> Ex: workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.0 --> 
> version 1.0
> workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.1 --> version 
> 1.1
> When we setup a query to crawl a repository folder, we discover content by 
> referring the child nodes. Because of that, now it seems to queue all the 
> document versions and submit them to OutputConnector thus producing duplicate 
> documents at the output (search) side.
> Is there a way to avoid this problem ? It will be great if the repository can 
> just take the latest document version and submit it as an update.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CONNECTORS-1009) Cmis Repository Connector does not handle Document updating properly

Reply via email to