[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349815#comment-17349815
 ] 

Karl Wright commented on CONNECTORS-1668:
-----------------------------------------

The logic for path rules is as follows:

{code}
      if (sn.getType().equals("pathrule"))
      {
        // New-style rule.
        // Here's the trick: We do what the first matching rule tells us to do.
        String pathMatch = sn.getAttributeValue("match");
        String action = sn.getAttributeValue("action");
        String ruleType = sn.getAttributeValue("type");

        // First, find out if we match EXACTLY.
        if (checkMatch(libraryPath,0,pathMatch))
        {
          // If this is true, the type also has to match if the rule is to 
apply.
          if (ruleType.equals("library"))
          {
            if (Logging.connectors.isDebugEnabled())
              Logging.connectors.debug("SharePoint: Library '"+libraryPath+"' 
exactly matched rule path '"+pathMatch+"'");
            if (action.equals("include"))
            {
              // For include rules, partial match is good enough to proceed.
              if (Logging.connectors.isDebugEnabled())
                Logging.connectors.debug("SharePoint: Including library 
'"+libraryPath+"'");
              return true;
            }
            if (Logging.connectors.isDebugEnabled())
              Logging.connectors.debug("SharePoint: Excluding library 
'"+libraryPath+"'");
            return false;
          }
        }
        else if (ruleType.equals("file") && 
checkPartialPathMatch(libraryPath,0,pathMatch,1) && action.equals("include"))
        {
          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("SharePoint: Library '"+libraryPath+"' 
partially matched file rule path '"+pathMatch+"' - including");
          return true;
        }
        else if (ruleType.equals("folder") && 
checkPartialPathMatch(libraryPath,0,pathMatch,1) && action.equals("include"))
        {
          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("SharePoint: Library '"+libraryPath+"' 
partially matched folder rule path '"+pathMatch+"' - including");
          return true;
        }
      }
    }
{code}

I need to see the rule type; as you can see, to include a library, you need a 
library rule, and to include a site, you need a site rule.

The checkMatch() method does this:

{code}
  /** Recursive worker method for checkMatch.  Returns 'true' if there is a 
path that consumes both
  * strings in their entirety in a matched way.
  *@param caseSensitive is true if file names are case sensitive.
  *@param sourceMatch is the source string (w/o wildcards)
  *@param match is the match string (w/wildcards)
  *@return true if there is a match.
  */
  protected static boolean checkMatch(boolean caseSensitive, String 
sourceMatch, String match)
{code}

The partial path match method looks like this:

{code}
  protected static boolean checkPartialPathMatch( String sourceMatch, int 
sourceIndex, String match, int requiredExtraPathSections )
  {
    // The partial match must be of a complete path, with at least a specified 
number of trailing path components possible in what remains.
    // Path components can include everything but the "/" character itself.
    //
    // The match string is the one containing the wildcards.  Both the "*" 
wildcard and the "?" wildcard will match a "/", which is intended but is why 
this
    // matcher is a little tricky to write.
    //
    // Note also that it is OK to return "true" more than strictly necessary, 
but it is never OK to return "false" incorrectly.

    // This is a partial path match.  That means that we don't have to 
completely use up the match string, but what's left on the match string after 
the source
    // string is used up MUST either be capable of being null, or be capable of 
starting with a "/"integral path sections, and MUST include at least n of these 
sections.
    //
{code}


If you look at the code, you will note there's quite a bit of debug logging 
around path matching.  The basic point though is that the entire match string 
must be consumed for the full match, meaning that anything that is not a 
wildcard MUST match, and for a partial match there must be at least N sections 
left over after the match is entirely consumed.

To summarize:

(1) You need a Site rule to include a site.
(2) You need a Library rule to include a library.







> Use of Wild Characters in SharePoint Connector.
> -----------------------------------------------
>
>                 Key: CONNECTORS-1668
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: SharePoint connector
>    Affects Versions: ManifoldCF 2.16
>            Reporter: Shashank Dwivedi
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF 2.16
>
>         Attachments: image-2021-05-23-00-36-45-378.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(20000)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to