[ 
https://issues.apache.org/jira/browse/CONNECTORS-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821234#comment-15821234
 ] 

Karl Wright commented on CONNECTORS-1364:
-----------------------------------------

Hi [~aeham.abushwashi], I think there's a better and more flexible way to add 
this feature.  Instead of two fields that you'd add to the server name, why not 
provide just a single field which is empty by default but when added extends 
the bin name?  That way advanced users could do their load management in 
whatever way they saw fit.  It also makes things much simpler and would mean 
that the user would not be troubled to figure out the values for two fields 
that do not effectively impact the crawl at all.



> Better bin naming in the Shared Drive Connector
> -----------------------------------------------
>
>                 Key: CONNECTORS-1364
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1364
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: JCIFS connector
>    Affects Versions: ManifoldCF 1.9
>            Reporter: Aeham Abushwashi
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.7
>
>         Attachments: CONNECTORS-1364.git.patch
>
>
> Hello and happy new year!
> Bin naming in the Shared Drive Connector makes assumptions that are not 
> always valid. 
> As I understand it, Manifold uses bins to prevent overloading data sources. 
> In the SDC, server name is designated as bin name. All jobs created against a 
> particular server will be treated as one unit when documents are prioritised, 
> which can severely disadvantage some jobs (e.g. late starters). 
> Moreover, this is incompatible with some common enterprise server topologies. 
> In Windows DFS, which is widely used in large enterprises, what the SDC 
> thinks of as a server name, isn’t actually a physical resource. It’s a 
> namespace that can span many servers and shares. In this case, it doesn’t 
> make sense to throttle simply on the root ‘server’ name. In other 
> environments, a powerful storage server can be more than capable of handling 
> high crawl load; overzealous throttling can end up limiting/hurting 
> Manifold’s performance there.
> I’m struggling to find a single solution that fits all so I’m leaning towards 
> passing in to the repo connection config some sort of server topology flag or 
> throttling depth flag as a hint that ShareDriveConnector#getBinNames can use 
> to decide whether the bin name should be server, server+share or 
> server+share+root_folder. Share and root_folder would need to be explicitly 
> passed in the repo config too or extracted from the documentIdentifier arg in 
> getBinNames (assuming it's reliable).
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to