[ 
https://issues.apache.org/jira/browse/CONNECTORS-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098574#comment-14098574
 ] 

Prasad Perera edited comment on CONNECTORS-1009 at 8/15/14 2:39 PM:
--------------------------------------------------------------------

Hello Karl,

Well the version string seems to be not null. Yet I couldn't figure out why 
this keep happening.
I have produced some std.out printed logs and I will attach them here for you 
to look at. I have attached the diff file of std.out.print statements so you 
can have a better understanding where they are printed at.

Regarding the version control issue for CmisRepositoryConnector. We have one 
option to solve the issue. Either we submit ALL the reachable versions as they 
are found, so the output connectors or the search end can decide what to do 
with different versions of the documents (?)
In that case, what we need to make sure is we populate specific document data 
for the specific node ID by retrieving all the document versions, rather than 
the latest version.


was (Author: prasadperera):
Hello Karl,

Well the version string seems to be not null. Yet I couldn't figure out why 
this keep happening.
I have produced some std.out printed logs and I will attach them here for you 
to look at. I have attached the diff file of std.out.print statements so you 
can have a better understanding where they are printed at.

Regarding the version control issue for CmisRepositoryConnector. We can have 
one option to solve the issue. Either we submit ALL the reachable version as 
they are so the output connectors or the search end can decide what to do with 
different versions of the documents ?
In that case, what we need to make sure is we populate specific document data 
for the specific node ID by retrieving all the document versions, rather than 
trying to get the latest version.

> Cmis Repository Connector does not handle Document updating properly
> --------------------------------------------------------------------
>
>                 Key: CONNECTORS-1009
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1009
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: CMIS connector
>            Reporter: Prasad Perera
>            Priority: Minor
>         Attachments: std_logs.txt, std_prints.diff
>
>
> As a part of the Fix for CONNECTORS-1004, It seems CmisRepositoryConnector 
> does not handle document updating properly.
> Case Scenario:
> * Create a continuous crawling job using  CmisRepositoryConnector.
> * Update a document on repository end.
> * The document keep submitting to OutputConnector at each crawling interval 
> though it was not updated afterwards.
> One possible Fix needed I is : @ CmisRepositoryConnector:processDocument,
>  activities.ingestDocumentWithException(nodeId, version, documentURI, rd);
> The documentURI should point to the old document URI (Now it points to the 
> latest documentURI discovered and it may seems to confuse document references 
> ?)
> Also, In ECM systems, for example in Alfresco, the documentIDs are formulated 
> with the version number as well.
> Ex: workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.0 --> 
> version 1.0
> workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.1 --> version 
> 1.1
> When we setup a query to crawl a repository folder, we discover content by 
> referring the child nodes. Because of that, now it seems to queue all the 
> document versions and submit them to OutputConnector thus producing duplicate 
> documents at the output (search) side.
> Is there a way to avoid this problem ? It will be great if the repository can 
> just take the latest document version and submit it as an update.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to