[ 
https://issues.apache.org/jira/browse/CONNECTORS-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654766#comment-13654766
 ] 

David Morana commented on CONNECTORS-688:
-----------------------------------------

Hey, that's a great idea! I'm not sure about the 2nd step because I think ALL 
ownerId's are negative. And you'll have to get/check the owner of every 
document... I thought I could save you a LAPI call by just adding the object ID 
into the UI.
However, I'll test it out and let you know...
by the way, how are you getting the list of all the documents to crawl? Do you 
call listObjects at the beginning using parent id 2000 (Enterprise WS)?
thanks,
P.S.
Please note, that this is the Recycle Bin module; it replaced the Undeleted 
folder that comes by default with Livelink. 
We're not using the Undeleted folder so I don't remember offhand how documents 
are handled there. 
                
> Can we exclude the Recycle Bin from being crawled in the Livelink Connector?
> ----------------------------------------------------------------------------
>
>                 Key: CONNECTORS-688
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-688
>             Project: ManifoldCF
>          Issue Type: Improvement
>    Affects Versions: ManifoldCF 1.1.1, ManifoldCF 1.2, ManifoldCF 1.3
>         Environment: running solr 4.0 final with manifoldcf v1.1.1 on RHEL 6 
> 64 bit on tomcat v7.0.34
>            Reporter: David Morana
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 1.3
>
>
> When a file in Livelink (Content Server 10 update 6) gets moved to the 
> Recycle Bin (RC v10.0.0; this module is NOT a part of the basic content 
> server install) the file is still crawled, indexed and it appears in search 
> results (although the link will be inaccessible to users)
> the recycle bin is a special folder on the content server; it holds documents 
> to be purged at a later date. LAPI still shows that they are not deleted. 
> Can we add a filter to the UI and Livelink connector to exclude certain 
> ownerID's (i.e. the ID of the recycle bin) from the crawl?
> In LivelinkConnectors.java you check to see if the version has been deleted 
> and an additional check would need to be added to see if it was sent to the 
> recycle bin (for example, the recycle bin's object id is 426023)
> Here's an example:
> after this call
> {code}
> int status = LLDocs.GetVersionInfo(vol,id,revNumber,versioninfo);
> {code}
> Just check the OWNER in the versioninfo object
> like so:
> {code}
> int ownerID = versioninfo.toInteger("OWNER");
> {code}
> If owner is the NEGATIVE value of the recycle bin ID (i.e -426023) then it's 
> marked for deletion and should be excluded from the index.
> I think this would be a great feature because you could just make it a 
> generic way to exclude project workspaces or special folders from being 
> crawled by supplying an object ID and comparing it to the owner ID of the 
> file. 
> Thanks,

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to