[
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349815#comment-17349815
]
Karl Wright commented on CONNECTORS-1668:
-----------------------------------------
The logic for path rules is as follows:
{code}
if (sn.getType().equals("pathrule"))
{
// New-style rule.
// Here's the trick: We do what the first matching rule tells us to do.
String pathMatch = sn.getAttributeValue("match");
String action = sn.getAttributeValue("action");
String ruleType = sn.getAttributeValue("type");
// First, find out if we match EXACTLY.
if (checkMatch(libraryPath,0,pathMatch))
{
// If this is true, the type also has to match if the rule is to
apply.
if (ruleType.equals("library"))
{
if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Library '"+libraryPath+"'
exactly matched rule path '"+pathMatch+"'");
if (action.equals("include"))
{
// For include rules, partial match is good enough to proceed.
if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Including library
'"+libraryPath+"'");
return true;
}
if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Excluding library
'"+libraryPath+"'");
return false;
}
}
else if (ruleType.equals("file") &&
checkPartialPathMatch(libraryPath,0,pathMatch,1) && action.equals("include"))
{
if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Library '"+libraryPath+"'
partially matched file rule path '"+pathMatch+"' - including");
return true;
}
else if (ruleType.equals("folder") &&
checkPartialPathMatch(libraryPath,0,pathMatch,1) && action.equals("include"))
{
if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Library '"+libraryPath+"'
partially matched folder rule path '"+pathMatch+"' - including");
return true;
}
}
}
{code}
I need to see the rule type; as you can see, to include a library, you need a
library rule, and to include a site, you need a site rule.
The checkMatch() method does this:
{code}
/** Recursive worker method for checkMatch. Returns 'true' if there is a
path that consumes both
* strings in their entirety in a matched way.
*@param caseSensitive is true if file names are case sensitive.
*@param sourceMatch is the source string (w/o wildcards)
*@param match is the match string (w/wildcards)
*@return true if there is a match.
*/
protected static boolean checkMatch(boolean caseSensitive, String
sourceMatch, String match)
{code}
The partial path match method looks like this:
{code}
protected static boolean checkPartialPathMatch( String sourceMatch, int
sourceIndex, String match, int requiredExtraPathSections )
{
// The partial match must be of a complete path, with at least a specified
number of trailing path components possible in what remains.
// Path components can include everything but the "/" character itself.
//
// The match string is the one containing the wildcards. Both the "*"
wildcard and the "?" wildcard will match a "/", which is intended but is why
this
// matcher is a little tricky to write.
//
// Note also that it is OK to return "true" more than strictly necessary,
but it is never OK to return "false" incorrectly.
// This is a partial path match. That means that we don't have to
completely use up the match string, but what's left on the match string after
the source
// string is used up MUST either be capable of being null, or be capable of
starting with a "/"integral path sections, and MUST include at least n of these
sections.
//
{code}
If you look at the code, you will note there's quite a bit of debug logging
around path matching. The basic point though is that the entire match string
must be consumed for the full match, meaning that anything that is not a
wildcard MUST match, and for a partial match there must be at least N sections
left over after the match is entirely consumed.
To summarize:
(1) You need a Site rule to include a site.
(2) You need a Library rule to include a library.
> Use of Wild Characters in SharePoint Connector.
> -----------------------------------------------
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
> Issue Type: Bug
> Components: SharePoint connector
> Affects Versions: ManifoldCF 2.16
> Reporter: Shashank Dwivedi
> Assignee: Karl Wright
> Priority: Major
> Fix For: ManifoldCF 2.16
>
> Attachments: image-2021-05-23-00-36-45-378.png
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
> -*Project 1*
> -Project Library
> -Folder 1
> -Folder 2 ... Folder N
> -*Project 2 ... Project N*
> -Project Library
> -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project
> N(20000)*, where N is a *large number.* I wish to process all files present
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/*
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user
> Documentation.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)