Re: sharepoint crawler documents limit

2020-01-27 Thread Karl Wright
I'm glad you got by this.  Thanks for letting us know what the issue was.
Karl

On Mon, Jan 27, 2020 at 4:05 AM Jorge Alonso Garcia 
wrote:

> Hi,
> We had change timeout on sharepoint IIS and now the process is able to
> crall all documents.
> Thanks for your help
>
>
>
> El lun., 30 dic. 2019 a las 12:18, Gaurav G ()
> escribió:
>
>> We had faced a similar issue, wherein our repo had 100,000 documents but
>> our crawler stopped after 5 documents. The issue turned out to be that
>> the Sharepoint query that was fired by the Sharepoint web service gets
>> progressively slower and eventually the connection starts timing out before
>> the next 1 records get returned. We increased a timeout parameter on
>> Sharepoint to 10 minutes and then after that we were able to crawl all
>> documents successfully.  I believe we had increased the parameter indicated
>> in the link below
>>
>>
>> https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website
>>
>>
>>
>> On Fri, Dec 20, 2019 at 6:27 PM Karl Wright  wrote:
>>
>>> Hi Priya,
>>>
>>> This has nothing to do with anything in ManifoldCF.
>>>
>>> Karl
>>>
>>>
>>> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora  wrote:
>>>
 Hi All,

 Is this issue something to have with below value/parameters set in
 properties.xml.
 [image: image.png]


 On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
 wrote:

> And what other sharepoint parameter I could check?
>
> Jorge Alonso Garcia
>
>
>
> El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
> escribió:
>
>> The code seems correct and many people are using it without
>> encountering this problem.  There may be another SharePoint configuration
>> parameter you also need to look at somewhere.
>>
>> Karl
>>
>>
>> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia <
>> jalon...@gmail.com> wrote:
>>
>>>
>>> Hi Karl,
>>> On sharepoint the list view threshold is 150,000 but we only receipt
>>> 20,000 from mcf
>>> [image: image.png]
>>>
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
>>> escribió:
>>>
 If the job finished without error it implies that the number of
 documents returned from this one library was 1 when the service is
 called the first time (starting at doc 0), 1 when it's called the
 second time (starting at doc 1), and zero when it is called the 
 third
 time (starting at doc 2).

 The plugin code is unremarkable and actually gets results in chunks
 of 1000 under the covers:

 >>
 SPQuery listQuery = new SPQuery();
 listQuery.Query = ">>> Override=\"TRUE\">";
 listQuery.QueryThrottleMode =
 SPQueryThrottleOption.Override;
 listQuery.ViewAttributes =
 "Scope=\"Recursive\"";
 listQuery.ViewFields = ">>> Name='FileRef' />";
 listQuery.RowLimit = 1000;

 XmlDocument doc = new XmlDocument();
 retVal = doc.CreateElement("GetListItems",
 "
 http://schemas.microsoft.com/sharepoint/soap/directory/;);
 XmlNode getListItemsNode =
 doc.CreateElement("GetListItemsResponse");

 uint counter = 0;
 do
 {
 if (counter >= startRowParam +
 rowLimitParam)
 break;

 SPListItemCollection collListItems =
 oList.GetItems(listQuery);


 foreach (SPListItem oListItem in
 collListItems)
 {
 if (counter >= startRowParam &&
 counter < startRowParam + rowLimitParam)
 {
 XmlNode resultNode =
 doc.CreateElement("GetListItemsResult");
 XmlAttribute idAttribute =
 doc.CreateAttribute("FileRef");
 idAttribute.Value =
 oListItem.Url;

 resultNode.Attributes.Append(idAttribute);
 XmlAttribute urlAttribute =
 doc.CreateAttribute("ListItemURL");
 //urlAttribute.Value =
 oListItem.ParentList.DefaultViewUrl;
 

Re: sharepoint crawler documents limit

2020-01-27 Thread Jorge Alonso Garcia
Hi,
We had change timeout on sharepoint IIS and now the process is able to
crall all documents.
Thanks for your help



El lun., 30 dic. 2019 a las 12:18, Gaurav G ()
escribió:

> We had faced a similar issue, wherein our repo had 100,000 documents but
> our crawler stopped after 5 documents. The issue turned out to be that
> the Sharepoint query that was fired by the Sharepoint web service gets
> progressively slower and eventually the connection starts timing out before
> the next 1 records get returned. We increased a timeout parameter on
> Sharepoint to 10 minutes and then after that we were able to crawl all
> documents successfully.  I believe we had increased the parameter indicated
> in the link below
>
>
> https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website
>
>
>
> On Fri, Dec 20, 2019 at 6:27 PM Karl Wright  wrote:
>
>> Hi Priya,
>>
>> This has nothing to do with anything in ManifoldCF.
>>
>> Karl
>>
>>
>> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora  wrote:
>>
>>> Hi All,
>>>
>>> Is this issue something to have with below value/parameters set in
>>> properties.xml.
>>> [image: image.png]
>>>
>>>
>>> On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
>>> wrote:
>>>
 And what other sharepoint parameter I could check?

 Jorge Alonso Garcia



 El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
 escribió:

> The code seems correct and many people are using it without
> encountering this problem.  There may be another SharePoint configuration
> parameter you also need to look at somewhere.
>
> Karl
>
>
> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia <
> jalon...@gmail.com> wrote:
>
>>
>> Hi Karl,
>> On sharepoint the list view threshold is 150,000 but we only receipt
>> 20,000 from mcf
>> [image: image.png]
>>
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
>> escribió:
>>
>>> If the job finished without error it implies that the number of
>>> documents returned from this one library was 1 when the service is
>>> called the first time (starting at doc 0), 1 when it's called the
>>> second time (starting at doc 1), and zero when it is called the 
>>> third
>>> time (starting at doc 2).
>>>
>>> The plugin code is unremarkable and actually gets results in chunks
>>> of 1000 under the covers:
>>>
>>> >>
>>> SPQuery listQuery = new SPQuery();
>>> listQuery.Query = ">> Override=\"TRUE\">";
>>> listQuery.QueryThrottleMode =
>>> SPQueryThrottleOption.Override;
>>> listQuery.ViewAttributes =
>>> "Scope=\"Recursive\"";
>>> listQuery.ViewFields = ">> Name='FileRef' />";
>>> listQuery.RowLimit = 1000;
>>>
>>> XmlDocument doc = new XmlDocument();
>>> retVal = doc.CreateElement("GetListItems",
>>> "
>>> http://schemas.microsoft.com/sharepoint/soap/directory/;);
>>> XmlNode getListItemsNode =
>>> doc.CreateElement("GetListItemsResponse");
>>>
>>> uint counter = 0;
>>> do
>>> {
>>> if (counter >= startRowParam +
>>> rowLimitParam)
>>> break;
>>>
>>> SPListItemCollection collListItems =
>>> oList.GetItems(listQuery);
>>>
>>>
>>> foreach (SPListItem oListItem in
>>> collListItems)
>>> {
>>> if (counter >= startRowParam &&
>>> counter < startRowParam + rowLimitParam)
>>> {
>>> XmlNode resultNode =
>>> doc.CreateElement("GetListItemsResult");
>>> XmlAttribute idAttribute =
>>> doc.CreateAttribute("FileRef");
>>> idAttribute.Value =
>>> oListItem.Url;
>>>
>>> resultNode.Attributes.Append(idAttribute);
>>> XmlAttribute urlAttribute =
>>> doc.CreateAttribute("ListItemURL");
>>> //urlAttribute.Value =
>>> oListItem.ParentList.DefaultViewUrl;
>>> urlAttribute.Value =
>>> string.Format("{0}?ID={1}",
>>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
>>> oListItem.ID);
>>>
>>> resultNode.Attributes.Append(urlAttribute);
>>>
>>> getListItemsNode.AppendChild(resultNode);

Re: sharepoint crawler documents limit

2019-12-30 Thread Gaurav G
We had faced a similar issue, wherein our repo had 100,000 documents but
our crawler stopped after 5 documents. The issue turned out to be that
the Sharepoint query that was fired by the Sharepoint web service gets
progressively slower and eventually the connection starts timing out before
the next 1 records get returned. We increased a timeout parameter on
Sharepoint to 10 minutes and then after that we were able to crawl all
documents successfully.  I believe we had increased the parameter indicated
in the link below

https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website



On Fri, Dec 20, 2019 at 6:27 PM Karl Wright  wrote:

> Hi Priya,
>
> This has nothing to do with anything in ManifoldCF.
>
> Karl
>
>
> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora  wrote:
>
>> Hi All,
>>
>> Is this issue something to have with below value/parameters set in
>> properties.xml.
>> [image: image.png]
>>
>>
>> On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
>> wrote:
>>
>>> And what other sharepoint parameter I could check?
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
>>> escribió:
>>>
 The code seems correct and many people are using it without
 encountering this problem.  There may be another SharePoint configuration
 parameter you also need to look at somewhere.

 Karl


 On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia 
 wrote:

>
> Hi Karl,
> On sharepoint the list view threshold is 150,000 but we only receipt
> 20,000 from mcf
> [image: image.png]
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
> escribió:
>
>> If the job finished without error it implies that the number of
>> documents returned from this one library was 1 when the service is
>> called the first time (starting at doc 0), 1 when it's called the
>> second time (starting at doc 1), and zero when it is called the third
>> time (starting at doc 2).
>>
>> The plugin code is unremarkable and actually gets results in chunks
>> of 1000 under the covers:
>>
>> >>
>> SPQuery listQuery = new SPQuery();
>> listQuery.Query = "> Override=\"TRUE\">";
>> listQuery.QueryThrottleMode =
>> SPQueryThrottleOption.Override;
>> listQuery.ViewAttributes =
>> "Scope=\"Recursive\"";
>> listQuery.ViewFields = "> Name='FileRef' />";
>> listQuery.RowLimit = 1000;
>>
>> XmlDocument doc = new XmlDocument();
>> retVal = doc.CreateElement("GetListItems",
>> "
>> http://schemas.microsoft.com/sharepoint/soap/directory/;);
>> XmlNode getListItemsNode =
>> doc.CreateElement("GetListItemsResponse");
>>
>> uint counter = 0;
>> do
>> {
>> if (counter >= startRowParam +
>> rowLimitParam)
>> break;
>>
>> SPListItemCollection collListItems =
>> oList.GetItems(listQuery);
>>
>>
>> foreach (SPListItem oListItem in
>> collListItems)
>> {
>> if (counter >= startRowParam &&
>> counter < startRowParam + rowLimitParam)
>> {
>> XmlNode resultNode =
>> doc.CreateElement("GetListItemsResult");
>> XmlAttribute idAttribute =
>> doc.CreateAttribute("FileRef");
>> idAttribute.Value = oListItem.Url;
>>
>> resultNode.Attributes.Append(idAttribute);
>> XmlAttribute urlAttribute =
>> doc.CreateAttribute("ListItemURL");
>> //urlAttribute.Value =
>> oListItem.ParentList.DefaultViewUrl;
>> urlAttribute.Value =
>> string.Format("{0}?ID={1}",
>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
>> oListItem.ID);
>>
>> resultNode.Attributes.Append(urlAttribute);
>>
>> getListItemsNode.AppendChild(resultNode);
>> }
>> counter++;
>> }
>>
>> listQuery.ListItemCollectionPosition =
>> collListItems.ListItemCollectionPosition;
>>
>> } while (listQuery.ListItemCollectionPosition
>> != null);
>>

Re: sharepoint crawler documents limit

2019-12-20 Thread Karl Wright
Hi Priya,

This has nothing to do with anything in ManifoldCF.

Karl


On Fri, Dec 20, 2019 at 7:56 AM Priya Arora  wrote:

> Hi All,
>
> Is this issue something to have with below value/parameters set in
> properties.xml.
> [image: image.png]
>
>
> On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
> wrote:
>
>> And what other sharepoint parameter I could check?
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
>> escribió:
>>
>>> The code seems correct and many people are using it without encountering
>>> this problem.  There may be another SharePoint configuration parameter you
>>> also need to look at somewhere.
>>>
>>> Karl
>>>
>>>
>>> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia 
>>> wrote:
>>>

 Hi Karl,
 On sharepoint the list view threshold is 150,000 but we only receipt
 20,000 from mcf
 [image: image.png]


 Jorge Alonso Garcia



 El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
 escribió:

> If the job finished without error it implies that the number of
> documents returned from this one library was 1 when the service is
> called the first time (starting at doc 0), 1 when it's called the
> second time (starting at doc 1), and zero when it is called the third
> time (starting at doc 2).
>
> The plugin code is unremarkable and actually gets results in chunks of
> 1000 under the covers:
>
> >>
> SPQuery listQuery = new SPQuery();
> listQuery.Query = " Override=\"TRUE\">";
> listQuery.QueryThrottleMode =
> SPQueryThrottleOption.Override;
> listQuery.ViewAttributes =
> "Scope=\"Recursive\"";
> listQuery.ViewFields = " Name='FileRef' />";
> listQuery.RowLimit = 1000;
>
> XmlDocument doc = new XmlDocument();
> retVal = doc.CreateElement("GetListItems",
> "
> http://schemas.microsoft.com/sharepoint/soap/directory/;);
> XmlNode getListItemsNode =
> doc.CreateElement("GetListItemsResponse");
>
> uint counter = 0;
> do
> {
> if (counter >= startRowParam +
> rowLimitParam)
> break;
>
> SPListItemCollection collListItems =
> oList.GetItems(listQuery);
>
>
> foreach (SPListItem oListItem in
> collListItems)
> {
> if (counter >= startRowParam &&
> counter < startRowParam + rowLimitParam)
> {
> XmlNode resultNode =
> doc.CreateElement("GetListItemsResult");
> XmlAttribute idAttribute =
> doc.CreateAttribute("FileRef");
> idAttribute.Value = oListItem.Url;
>
> resultNode.Attributes.Append(idAttribute);
> XmlAttribute urlAttribute =
> doc.CreateAttribute("ListItemURL");
> //urlAttribute.Value =
> oListItem.ParentList.DefaultViewUrl;
> urlAttribute.Value =
> string.Format("{0}?ID={1}",
> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
> oListItem.ID);
>
> resultNode.Attributes.Append(urlAttribute);
>
> getListItemsNode.AppendChild(resultNode);
> }
> counter++;
> }
>
> listQuery.ListItemCollectionPosition =
> collListItems.ListItemCollectionPosition;
>
> } while (listQuery.ListItemCollectionPosition
> != null);
>
> retVal.AppendChild(getListItemsNode);
> <<
>
> The code is clearly working if you get 2 results returned, so I
> submit that perhaps there's a configured limit in your SharePoint instance
> that prevents listing more than 2.  That's the only way I can explain
> this.
>
> Karl
>
>
> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia <
> jalon...@gmail.com> wrote:
>
>> Hi,
>> The job finnish ok (several times) but always with this 2
>> documents, for some reason the loop only execute twice
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
>> escribió:
>>
>>> If the are all in one document, then you'd be running this 

Re: sharepoint crawler documents limit

2019-12-20 Thread Priya Arora
Hi All,

Is this issue something to have with below value/parameters set in
properties.xml.
[image: image.png]


On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
wrote:

> And what other sharepoint parameter I could check?
>
> Jorge Alonso Garcia
>
>
>
> El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
> escribió:
>
>> The code seems correct and many people are using it without encountering
>> this problem.  There may be another SharePoint configuration parameter you
>> also need to look at somewhere.
>>
>> Karl
>>
>>
>> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia 
>> wrote:
>>
>>>
>>> Hi Karl,
>>> On sharepoint the list view threshold is 150,000 but we only receipt
>>> 20,000 from mcf
>>> [image: image.png]
>>>
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
>>> escribió:
>>>
 If the job finished without error it implies that the number of
 documents returned from this one library was 1 when the service is
 called the first time (starting at doc 0), 1 when it's called the
 second time (starting at doc 1), and zero when it is called the third
 time (starting at doc 2).

 The plugin code is unremarkable and actually gets results in chunks of
 1000 under the covers:

 >>
 SPQuery listQuery = new SPQuery();
 listQuery.Query = ">>> Override=\"TRUE\">";
 listQuery.QueryThrottleMode =
 SPQueryThrottleOption.Override;
 listQuery.ViewAttributes =
 "Scope=\"Recursive\"";
 listQuery.ViewFields = ">>> Name='FileRef' />";
 listQuery.RowLimit = 1000;

 XmlDocument doc = new XmlDocument();
 retVal = doc.CreateElement("GetListItems",
 "
 http://schemas.microsoft.com/sharepoint/soap/directory/;);
 XmlNode getListItemsNode =
 doc.CreateElement("GetListItemsResponse");

 uint counter = 0;
 do
 {
 if (counter >= startRowParam +
 rowLimitParam)
 break;

 SPListItemCollection collListItems =
 oList.GetItems(listQuery);


 foreach (SPListItem oListItem in
 collListItems)
 {
 if (counter >= startRowParam && counter
 < startRowParam + rowLimitParam)
 {
 XmlNode resultNode =
 doc.CreateElement("GetListItemsResult");
 XmlAttribute idAttribute =
 doc.CreateAttribute("FileRef");
 idAttribute.Value = oListItem.Url;

 resultNode.Attributes.Append(idAttribute);
 XmlAttribute urlAttribute =
 doc.CreateAttribute("ListItemURL");
 //urlAttribute.Value =
 oListItem.ParentList.DefaultViewUrl;
 urlAttribute.Value =
 string.Format("{0}?ID={1}",
 oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
 oListItem.ID);

 resultNode.Attributes.Append(urlAttribute);

 getListItemsNode.AppendChild(resultNode);
 }
 counter++;
 }

 listQuery.ListItemCollectionPosition =
 collListItems.ListItemCollectionPosition;

 } while (listQuery.ListItemCollectionPosition
 != null);

 retVal.AppendChild(getListItemsNode);
 <<

 The code is clearly working if you get 2 results returned, so I
 submit that perhaps there's a configured limit in your SharePoint instance
 that prevents listing more than 2.  That's the only way I can explain
 this.

 Karl


 On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia <
 jalon...@gmail.com> wrote:

> Hi,
> The job finnish ok (several times) but always with this 2
> documents, for some reason the loop only execute twice
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
> escribió:
>
>> If the are all in one document, then you'd be running this code:
>>
>> >>
>> int startingIndex = 0;
>> int amtToRequest = 1;
>> while (true)
>> {
>>
>> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
>> itemsResult =
>>

Re: sharepoint crawler documents limit

2019-12-20 Thread Jorge Alonso Garcia
And what other sharepoint parameter I could check?

Jorge Alonso Garcia



El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
escribió:

> The code seems correct and many people are using it without encountering
> this problem.  There may be another SharePoint configuration parameter you
> also need to look at somewhere.
>
> Karl
>
>
> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia 
> wrote:
>
>>
>> Hi Karl,
>> On sharepoint the list view threshold is 150,000 but we only receipt
>> 20,000 from mcf
>> [image: image.png]
>>
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
>> escribió:
>>
>>> If the job finished without error it implies that the number of
>>> documents returned from this one library was 1 when the service is
>>> called the first time (starting at doc 0), 1 when it's called the
>>> second time (starting at doc 1), and zero when it is called the third
>>> time (starting at doc 2).
>>>
>>> The plugin code is unremarkable and actually gets results in chunks of
>>> 1000 under the covers:
>>>
>>> >>
>>> SPQuery listQuery = new SPQuery();
>>> listQuery.Query = ">> Override=\"TRUE\">";
>>> listQuery.QueryThrottleMode =
>>> SPQueryThrottleOption.Override;
>>> listQuery.ViewAttributes = "Scope=\"Recursive\"";
>>> listQuery.ViewFields = ">> />";
>>> listQuery.RowLimit = 1000;
>>>
>>> XmlDocument doc = new XmlDocument();
>>> retVal = doc.CreateElement("GetListItems",
>>> "
>>> http://schemas.microsoft.com/sharepoint/soap/directory/;);
>>> XmlNode getListItemsNode =
>>> doc.CreateElement("GetListItemsResponse");
>>>
>>> uint counter = 0;
>>> do
>>> {
>>> if (counter >= startRowParam + rowLimitParam)
>>> break;
>>>
>>> SPListItemCollection collListItems =
>>> oList.GetItems(listQuery);
>>>
>>>
>>> foreach (SPListItem oListItem in
>>> collListItems)
>>> {
>>> if (counter >= startRowParam && counter
>>> < startRowParam + rowLimitParam)
>>> {
>>> XmlNode resultNode =
>>> doc.CreateElement("GetListItemsResult");
>>> XmlAttribute idAttribute =
>>> doc.CreateAttribute("FileRef");
>>> idAttribute.Value = oListItem.Url;
>>>
>>> resultNode.Attributes.Append(idAttribute);
>>> XmlAttribute urlAttribute =
>>> doc.CreateAttribute("ListItemURL");
>>> //urlAttribute.Value =
>>> oListItem.ParentList.DefaultViewUrl;
>>> urlAttribute.Value =
>>> string.Format("{0}?ID={1}",
>>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
>>> oListItem.ID);
>>>
>>> resultNode.Attributes.Append(urlAttribute);
>>>
>>> getListItemsNode.AppendChild(resultNode);
>>> }
>>> counter++;
>>> }
>>>
>>> listQuery.ListItemCollectionPosition =
>>> collListItems.ListItemCollectionPosition;
>>>
>>> } while (listQuery.ListItemCollectionPosition !=
>>> null);
>>>
>>> retVal.AppendChild(getListItemsNode);
>>> <<
>>>
>>> The code is clearly working if you get 2 results returned, so I
>>> submit that perhaps there's a configured limit in your SharePoint instance
>>> that prevents listing more than 2.  That's the only way I can explain
>>> this.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia 
>>> wrote:
>>>
 Hi,
 The job finnish ok (several times) but always with this 2
 documents, for some reason the loop only execute twice

 Jorge Alonso Garcia



 El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
 escribió:

> If the are all in one document, then you'd be running this code:
>
> >>
> int startingIndex = 0;
> int amtToRequest = 1;
> while (true)
> {
>
> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
> itemsResult =
>
> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest));
>
>   MessageElement[] itemsList = itemsResult.get_any();
>
>   if (Logging.connectors.isDebugEnabled()){
> Logging.connectors.debug("SharePoint: getChildren xml
> response: " + itemsList[0].toString());
>   

Re: sharepoint crawler documents limit

2019-12-20 Thread Karl Wright
The code seems correct and many people are using it without encountering
this problem.  There may be another SharePoint configuration parameter you
also need to look at somewhere.

Karl


On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia 
wrote:

>
> Hi Karl,
> On sharepoint the list view threshold is 150,000 but we only receipt
> 20,000 from mcf
> [image: image.png]
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
> escribió:
>
>> If the job finished without error it implies that the number of documents
>> returned from this one library was 1 when the service is called the
>> first time (starting at doc 0), 1 when it's called the second time
>> (starting at doc 1), and zero when it is called the third time
>> (starting at doc 2).
>>
>> The plugin code is unremarkable and actually gets results in chunks of
>> 1000 under the covers:
>>
>> >>
>> SPQuery listQuery = new SPQuery();
>> listQuery.Query = "> Override=\"TRUE\">";
>> listQuery.QueryThrottleMode =
>> SPQueryThrottleOption.Override;
>> listQuery.ViewAttributes = "Scope=\"Recursive\"";
>> listQuery.ViewFields = "> />";
>> listQuery.RowLimit = 1000;
>>
>> XmlDocument doc = new XmlDocument();
>> retVal = doc.CreateElement("GetListItems",
>> "
>> http://schemas.microsoft.com/sharepoint/soap/directory/;);
>> XmlNode getListItemsNode =
>> doc.CreateElement("GetListItemsResponse");
>>
>> uint counter = 0;
>> do
>> {
>> if (counter >= startRowParam + rowLimitParam)
>> break;
>>
>> SPListItemCollection collListItems =
>> oList.GetItems(listQuery);
>>
>>
>> foreach (SPListItem oListItem in
>> collListItems)
>> {
>> if (counter >= startRowParam && counter <
>> startRowParam + rowLimitParam)
>> {
>> XmlNode resultNode =
>> doc.CreateElement("GetListItemsResult");
>> XmlAttribute idAttribute =
>> doc.CreateAttribute("FileRef");
>> idAttribute.Value = oListItem.Url;
>>
>> resultNode.Attributes.Append(idAttribute);
>> XmlAttribute urlAttribute =
>> doc.CreateAttribute("ListItemURL");
>> //urlAttribute.Value =
>> oListItem.ParentList.DefaultViewUrl;
>> urlAttribute.Value =
>> string.Format("{0}?ID={1}",
>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
>> oListItem.ID);
>>
>> resultNode.Attributes.Append(urlAttribute);
>>
>> getListItemsNode.AppendChild(resultNode);
>> }
>> counter++;
>> }
>>
>> listQuery.ListItemCollectionPosition =
>> collListItems.ListItemCollectionPosition;
>>
>> } while (listQuery.ListItemCollectionPosition !=
>> null);
>>
>> retVal.AppendChild(getListItemsNode);
>> <<
>>
>> The code is clearly working if you get 2 results returned, so I
>> submit that perhaps there's a configured limit in your SharePoint instance
>> that prevents listing more than 2.  That's the only way I can explain
>> this.
>>
>> Karl
>>
>>
>> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia 
>> wrote:
>>
>>> Hi,
>>> The job finnish ok (several times) but always with this 2 documents,
>>> for some reason the loop only execute twice
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
>>> escribió:
>>>
 If the are all in one document, then you'd be running this code:

 >>
 int startingIndex = 0;
 int amtToRequest = 1;
 while (true)
 {

 com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
 itemsResult =

 itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest));

   MessageElement[] itemsList = itemsResult.get_any();

   if (Logging.connectors.isDebugEnabled()){
 Logging.connectors.debug("SharePoint: getChildren xml
 response: " + itemsList[0].toString());
   }

   if (itemsList.length != 1)
 throw new ManifoldCFException("Bad response - expecting one
 outer 'GetListItems' node, saw "+Integer.toString(itemsList.length));

   MessageElement items = itemsList[0];
   if
 

Re: sharepoint crawler documents limit

2019-12-20 Thread Jorge Alonso Garcia
Hi Karl,
On sharepoint the list view threshold is 150,000 but we only receipt 20,000
from mcf
[image: image.png]


Jorge Alonso Garcia



El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
escribió:

> If the job finished without error it implies that the number of documents
> returned from this one library was 1 when the service is called the
> first time (starting at doc 0), 1 when it's called the second time
> (starting at doc 1), and zero when it is called the third time
> (starting at doc 2).
>
> The plugin code is unremarkable and actually gets results in chunks of
> 1000 under the covers:
>
> >>
> SPQuery listQuery = new SPQuery();
> listQuery.Query = " Override=\"TRUE\">";
> listQuery.QueryThrottleMode =
> SPQueryThrottleOption.Override;
> listQuery.ViewAttributes = "Scope=\"Recursive\"";
> listQuery.ViewFields = " />";
> listQuery.RowLimit = 1000;
>
> XmlDocument doc = new XmlDocument();
> retVal = doc.CreateElement("GetListItems",
> "
> http://schemas.microsoft.com/sharepoint/soap/directory/;);
> XmlNode getListItemsNode =
> doc.CreateElement("GetListItemsResponse");
>
> uint counter = 0;
> do
> {
> if (counter >= startRowParam + rowLimitParam)
> break;
>
> SPListItemCollection collListItems =
> oList.GetItems(listQuery);
>
>
> foreach (SPListItem oListItem in collListItems)
> {
> if (counter >= startRowParam && counter <
> startRowParam + rowLimitParam)
> {
> XmlNode resultNode =
> doc.CreateElement("GetListItemsResult");
> XmlAttribute idAttribute =
> doc.CreateAttribute("FileRef");
> idAttribute.Value = oListItem.Url;
>
> resultNode.Attributes.Append(idAttribute);
> XmlAttribute urlAttribute =
> doc.CreateAttribute("ListItemURL");
> //urlAttribute.Value =
> oListItem.ParentList.DefaultViewUrl;
> urlAttribute.Value =
> string.Format("{0}?ID={1}",
> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
> oListItem.ID);
>
> resultNode.Attributes.Append(urlAttribute);
>
> getListItemsNode.AppendChild(resultNode);
> }
> counter++;
> }
>
> listQuery.ListItemCollectionPosition =
> collListItems.ListItemCollectionPosition;
>
> } while (listQuery.ListItemCollectionPosition !=
> null);
>
> retVal.AppendChild(getListItemsNode);
> <<
>
> The code is clearly working if you get 2 results returned, so I submit
> that perhaps there's a configured limit in your SharePoint instance that
> prevents listing more than 2.  That's the only way I can explain this.
>
> Karl
>
>
> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia 
> wrote:
>
>> Hi,
>> The job finnish ok (several times) but always with this 2 documents,
>> for some reason the loop only execute twice
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
>> escribió:
>>
>>> If the are all in one document, then you'd be running this code:
>>>
>>> >>
>>> int startingIndex = 0;
>>> int amtToRequest = 1;
>>> while (true)
>>> {
>>>
>>> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
>>> itemsResult =
>>>
>>> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest));
>>>
>>>   MessageElement[] itemsList = itemsResult.get_any();
>>>
>>>   if (Logging.connectors.isDebugEnabled()){
>>> Logging.connectors.debug("SharePoint: getChildren xml
>>> response: " + itemsList[0].toString());
>>>   }
>>>
>>>   if (itemsList.length != 1)
>>> throw new ManifoldCFException("Bad response - expecting one
>>> outer 'GetListItems' node, saw "+Integer.toString(itemsList.length));
>>>
>>>   MessageElement items = itemsList[0];
>>>   if
>>> (!items.getElementName().getLocalName().equals("GetListItems"))
>>> throw new ManifoldCFException("Bad response - outer node
>>> should have been 'GetListItems' node");
>>>
>>>   int resultCount = 0;
>>>   Iterator iter = items.getChildElements();
>>>   while (iter.hasNext())
>>>   {
>>> MessageElement child = (MessageElement)iter.next();
>>> 

Re: sharepoint crawler documents limit

2019-12-19 Thread Jorge Alonso Garcia
Hi,
The job finnish ok (several times) but always with this 2 documents,
for some reason the loop only execute twice

Jorge Alonso Garcia



El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
escribió:

> If the are all in one document, then you'd be running this code:
>
> >>
> int startingIndex = 0;
> int amtToRequest = 1;
> while (true)
> {
>
> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
> itemsResult =
>
> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest));
>
>   MessageElement[] itemsList = itemsResult.get_any();
>
>   if (Logging.connectors.isDebugEnabled()){
> Logging.connectors.debug("SharePoint: getChildren xml
> response: " + itemsList[0].toString());
>   }
>
>   if (itemsList.length != 1)
> throw new ManifoldCFException("Bad response - expecting one
> outer 'GetListItems' node, saw "+Integer.toString(itemsList.length));
>
>   MessageElement items = itemsList[0];
>   if
> (!items.getElementName().getLocalName().equals("GetListItems"))
> throw new ManifoldCFException("Bad response - outer node
> should have been 'GetListItems' node");
>
>   int resultCount = 0;
>   Iterator iter = items.getChildElements();
>   while (iter.hasNext())
>   {
> MessageElement child = (MessageElement)iter.next();
> if
> (child.getElementName().getLocalName().equals("GetListItemsResponse"))
> {
>   Iterator resultIter = child.getChildElements();
>   while (resultIter.hasNext())
>   {
> MessageElement result = (MessageElement)resultIter.next();
> if
> (result.getElementName().getLocalName().equals("GetListItemsResult"))
> {
>   resultCount++;
>   String relPath = result.getAttribute("FileRef");
>   String displayURL = result.getAttribute("ListItemURL");
>   fileStream.addFile( relPath, displayURL );
> }
>   }
>
> }
>   }
>
>   if (resultCount < amtToRequest)
> break;
>
>   startingIndex += resultCount;
> }
> <<
>
> What this does is request library content URLs in chunks of 1.  It
> stops when it receives less than 1 documents from any one request.
>
> If the documents were all in one library, then one call to the web service
> yielded 1 documents, and the second call yielded 1 documents, and
> there was no third call for no reason I can figure out.  Since 1
> documents were returned each time the loop ought to just continue, unless
> there was some kind of error.  Does the job succeed, or does it abort?
>
> Karl
>
>
> On Thu, Dec 19, 2019 at 12:05 PM Karl Wright  wrote:
>
>> If you are using the MCF plugin, and selecting the appropriate version of
>> Sharepoint in the connection configuration, there is no hard limit I'm
>> aware of for any Sharepoint job.  We have lots of other people using
>> SharePoint and nobody has reported this ever before.
>>
>> If your SharePoint connection says "SharePoint 2003" as the SharePoint
>> version, then sure, that would be expected behavior.  So please check that
>> first.
>>
>> The other question I have is your description of you first getting 10001
>> documents and then later 20002.  That's not how ManifoldCF works.  At the
>> start of the crawl, seeds are added; this would start out just being the
>> root, and then other documents would be discovered as the crawl proceeded,
>> after subsites and libraries are discovered.  So I am still trying to
>> square that with your description of how this is working for you.
>>
>> Are all of your documents in one library?  Or two libraries?
>>
>> Karl
>>
>>
>>
>>
>> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia 
>> wrote:
>>
>>> Hi,
>>> On UI shows 20,002 documents (on a firts phase show 10,001,and after
>>> sometime of process raise to 20,002) .
>>> It looks like a hard limit, there is more files on sharepoint with the
>>> used criteria
>>>
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 16:05, Karl Wright ()
>>> escribió:
>>>
 Hi Jorge,

 When you run the job, do you see more than 20,000 documents as part of
 it?

 Do you see *exactly* 20,000 documents as part of it?

 Unless you are seeing a hard number like that in the UI for that job on
 the job status page, I doubt very much that the problem is a numerical
 limitation in the number of documents.  I would suspect that the inclusion
 criteria, e.g. the mime type or maximum length, is excluding documents.

 Karl


 On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
 wrote:

> Hi Karl,
> We had installed the shaterpoint plugin, and access properly
> 

Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
If the are all in one document, then you'd be running this code:

>>
int startingIndex = 0;
int amtToRequest = 1;
while (true)
{

com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
itemsResult =

itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest));

  MessageElement[] itemsList = itemsResult.get_any();

  if (Logging.connectors.isDebugEnabled()){
Logging.connectors.debug("SharePoint: getChildren xml response:
" + itemsList[0].toString());
  }

  if (itemsList.length != 1)
throw new ManifoldCFException("Bad response - expecting one
outer 'GetListItems' node, saw "+Integer.toString(itemsList.length));

  MessageElement items = itemsList[0];
  if (!items.getElementName().getLocalName().equals("GetListItems"))
throw new ManifoldCFException("Bad response - outer node should
have been 'GetListItems' node");

  int resultCount = 0;
  Iterator iter = items.getChildElements();
  while (iter.hasNext())
  {
MessageElement child = (MessageElement)iter.next();
if
(child.getElementName().getLocalName().equals("GetListItemsResponse"))
{
  Iterator resultIter = child.getChildElements();
  while (resultIter.hasNext())
  {
MessageElement result = (MessageElement)resultIter.next();
if
(result.getElementName().getLocalName().equals("GetListItemsResult"))
{
  resultCount++;
  String relPath = result.getAttribute("FileRef");
  String displayURL = result.getAttribute("ListItemURL");
  fileStream.addFile( relPath, displayURL );
}
  }

}
  }

  if (resultCount < amtToRequest)
break;

  startingIndex += resultCount;
}
<<

What this does is request library content URLs in chunks of 1.  It
stops when it receives less than 1 documents from any one request.

If the documents were all in one library, then one call to the web service
yielded 1 documents, and the second call yielded 1 documents, and
there was no third call for no reason I can figure out.  Since 1
documents were returned each time the loop ought to just continue, unless
there was some kind of error.  Does the job succeed, or does it abort?

Karl


On Thu, Dec 19, 2019 at 12:05 PM Karl Wright  wrote:

> If you are using the MCF plugin, and selecting the appropriate version of
> Sharepoint in the connection configuration, there is no hard limit I'm
> aware of for any Sharepoint job.  We have lots of other people using
> SharePoint and nobody has reported this ever before.
>
> If your SharePoint connection says "SharePoint 2003" as the SharePoint
> version, then sure, that would be expected behavior.  So please check that
> first.
>
> The other question I have is your description of you first getting 10001
> documents and then later 20002.  That's not how ManifoldCF works.  At the
> start of the crawl, seeds are added; this would start out just being the
> root, and then other documents would be discovered as the crawl proceeded,
> after subsites and libraries are discovered.  So I am still trying to
> square that with your description of how this is working for you.
>
> Are all of your documents in one library?  Or two libraries?
>
> Karl
>
>
>
>
> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia 
> wrote:
>
>> Hi,
>> On UI shows 20,002 documents (on a firts phase show 10,001,and after
>> sometime of process raise to 20,002) .
>> It looks like a hard limit, there is more files on sharepoint with the
>> used criteria
>>
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 16:05, Karl Wright ()
>> escribió:
>>
>>> Hi Jorge,
>>>
>>> When you run the job, do you see more than 20,000 documents as part of
>>> it?
>>>
>>> Do you see *exactly* 20,000 documents as part of it?
>>>
>>> Unless you are seeing a hard number like that in the UI for that job on
>>> the job status page, I doubt very much that the problem is a numerical
>>> limitation in the number of documents.  I would suspect that the inclusion
>>> criteria, e.g. the mime type or maximum length, is excluding documents.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
>>> wrote:
>>>
 Hi Karl,
 We had installed the shaterpoint plugin, and access properly
 http:/server/_vti_bin/MCPermissions.asmx

 [image: image.png]

 Sharepoint has more than 20,000 documents, but when execute the jon
 only extract these 20,000. How Can I check where is the issue?

 Regards


 Jorge Alonso Garcia



 El jue., 19 dic. 2019 a las 12:52, Karl Wright ()
 escribió:

> By "stop at 20,000" do you mean that it finds more 

Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
If you are using the MCF plugin, and selecting the appropriate version of
Sharepoint in the connection configuration, there is no hard limit I'm
aware of for any Sharepoint job.  We have lots of other people using
SharePoint and nobody has reported this ever before.

If your SharePoint connection says "SharePoint 2003" as the SharePoint
version, then sure, that would be expected behavior.  So please check that
first.

The other question I have is your description of you first getting 10001
documents and then later 20002.  That's not how ManifoldCF works.  At the
start of the crawl, seeds are added; this would start out just being the
root, and then other documents would be discovered as the crawl proceeded,
after subsites and libraries are discovered.  So I am still trying to
square that with your description of how this is working for you.

Are all of your documents in one library?  Or two libraries?

Karl




On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia 
wrote:

> Hi,
> On UI shows 20,002 documents (on a firts phase show 10,001,and after
> sometime of process raise to 20,002) .
> It looks like a hard limit, there is more files on sharepoint with the
> used criteria
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 16:05, Karl Wright ()
> escribió:
>
>> Hi Jorge,
>>
>> When you run the job, do you see more than 20,000 documents as part of it?
>>
>> Do you see *exactly* 20,000 documents as part of it?
>>
>> Unless you are seeing a hard number like that in the UI for that job on
>> the job status page, I doubt very much that the problem is a numerical
>> limitation in the number of documents.  I would suspect that the inclusion
>> criteria, e.g. the mime type or maximum length, is excluding documents.
>>
>> Karl
>>
>>
>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
>> wrote:
>>
>>> Hi Karl,
>>> We had installed the shaterpoint plugin, and access properly
>>> http:/server/_vti_bin/MCPermissions.asmx
>>>
>>> [image: image.png]
>>>
>>> Sharepoint has more than 20,000 documents, but when execute the jon only
>>> extract these 20,000. How Can I check where is the issue?
>>>
>>> Regards
>>>
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright ()
>>> escribió:
>>>
 By "stop at 20,000" do you mean that it finds more than 20,000 but
 stops crawling at that time?  Or what exactly do you mean here?

 FWIW, the behavior you describe sounds like you may not have installed
 the SharePoint plugin and may have selected a version of SharePoint that is
 inappropriate.  All SharePoint versions after 2008 limit the number of
 documents returned using the standard web services methods.  The plugin
 allows us to bypass that hard limit.

 Karl


 On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
 wrote:

> Hi,
> We have an isuse with sharepoint connector.
> There is a job that crawl a sharepoint 2016, but it is not recovering
> all files, it stop at 20.000 documents without any error.
> Is there any parameter that should be change to avoid this limitation?
>
> Regards
> Jorge Alonso Garcia
>
>


Re: sharepoint crawler documents limit

2019-12-19 Thread Jorge Alonso Garcia
Hi,
On UI shows 20,002 documents (on a firts phase show 10,001,and after
sometime of process raise to 20,002) .
It looks like a hard limit, there is more files on sharepoint with the used
criteria


Jorge Alonso Garcia



El jue., 19 dic. 2019 a las 16:05, Karl Wright ()
escribió:

> Hi Jorge,
>
> When you run the job, do you see more than 20,000 documents as part of it?
>
> Do you see *exactly* 20,000 documents as part of it?
>
> Unless you are seeing a hard number like that in the UI for that job on
> the job status page, I doubt very much that the problem is a numerical
> limitation in the number of documents.  I would suspect that the inclusion
> criteria, e.g. the mime type or maximum length, is excluding documents.
>
> Karl
>
>
> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
> wrote:
>
>> Hi Karl,
>> We had installed the shaterpoint plugin, and access properly http:/server/
>> _vti_bin/MCPermissions.asmx
>>
>> [image: image.png]
>>
>> Sharepoint has more than 20,000 documents, but when execute the jon only
>> extract these 20,000. How Can I check where is the issue?
>>
>> Regards
>>
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 12:52, Karl Wright ()
>> escribió:
>>
>>> By "stop at 20,000" do you mean that it finds more than 20,000 but stops
>>> crawling at that time?  Or what exactly do you mean here?
>>>
>>> FWIW, the behavior you describe sounds like you may not have installed
>>> the SharePoint plugin and may have selected a version of SharePoint that is
>>> inappropriate.  All SharePoint versions after 2008 limit the number of
>>> documents returned using the standard web services methods.  The plugin
>>> allows us to bypass that hard limit.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
>>> wrote:
>>>
 Hi,
 We have an isuse with sharepoint connector.
 There is a job that crawl a sharepoint 2016, but it is not recovering
 all files, it stop at 20.000 documents without any error.
 Is there any parameter that should be change to avoid this limitation?

 Regards
 Jorge Alonso Garcia




Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
Hi Jorge,

When you run the job, do you see more than 20,000 documents as part of it?

Do you see *exactly* 20,000 documents as part of it?

Unless you are seeing a hard number like that in the UI for that job on the
job status page, I doubt very much that the problem is a numerical
limitation in the number of documents.  I would suspect that the inclusion
criteria, e.g. the mime type or maximum length, is excluding documents.

Karl


On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia 
wrote:

> Hi Karl,
> We had installed the shaterpoint plugin, and access properly http:/server/
> _vti_bin/MCPermissions.asmx
>
> [image: image.png]
>
> Sharepoint has more than 20,000 documents, but when execute the jon only
> extract these 20,000. How Can I check where is the issue?
>
> Regards
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 12:52, Karl Wright ()
> escribió:
>
>> By "stop at 20,000" do you mean that it finds more than 20,000 but stops
>> crawling at that time?  Or what exactly do you mean here?
>>
>> FWIW, the behavior you describe sounds like you may not have installed
>> the SharePoint plugin and may have selected a version of SharePoint that is
>> inappropriate.  All SharePoint versions after 2008 limit the number of
>> documents returned using the standard web services methods.  The plugin
>> allows us to bypass that hard limit.
>>
>> Karl
>>
>>
>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
>> wrote:
>>
>>> Hi,
>>> We have an isuse with sharepoint connector.
>>> There is a job that crawl a sharepoint 2016, but it is not recovering
>>> all files, it stop at 20.000 documents without any error.
>>> Is there any parameter that should be change to avoid this limitation?
>>>
>>> Regards
>>> Jorge Alonso Garcia
>>>
>>>


Re: sharepoint crawler documents limit

2019-12-19 Thread Jorge Alonso Garcia
Hi Karl,
We had installed the shaterpoint plugin, and access properly http:/server/
_vti_bin/MCPermissions.asmx

[image: image.png]

Sharepoint has more than 20,000 documents, but when execute the jon only
extract these 20,000. How Can I check where is the issue?

Regards


Jorge Alonso Garcia



El jue., 19 dic. 2019 a las 12:52, Karl Wright ()
escribió:

> By "stop at 20,000" do you mean that it finds more than 20,000 but stops
> crawling at that time?  Or what exactly do you mean here?
>
> FWIW, the behavior you describe sounds like you may not have installed the
> SharePoint plugin and may have selected a version of SharePoint that is
> inappropriate.  All SharePoint versions after 2008 limit the number of
> documents returned using the standard web services methods.  The plugin
> allows us to bypass that hard limit.
>
> Karl
>
>
> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
> wrote:
>
>> Hi,
>> We have an isuse with sharepoint connector.
>> There is a job that crawl a sharepoint 2016, but it is not recovering all
>> files, it stop at 20.000 documents without any error.
>> Is there any parameter that should be change to avoid this limitation?
>>
>> Regards
>> Jorge Alonso Garcia
>>
>>


Re: sharepoint crawler documents limit

2019-12-19 Thread Karl Wright
By "stop at 20,000" do you mean that it finds more than 20,000 but stops
crawling at that time?  Or what exactly do you mean here?

FWIW, the behavior you describe sounds like you may not have installed the
SharePoint plugin and may have selected a version of SharePoint that is
inappropriate.  All SharePoint versions after 2008 limit the number of
documents returned using the standard web services methods.  The plugin
allows us to bypass that hard limit.

Karl


On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia 
wrote:

> Hi,
> We have an isuse with sharepoint connector.
> There is a job that crawl a sharepoint 2016, but it is not recovering all
> files, it stop at 20.000 documents without any error.
> Is there any parameter that should be change to avoid this limitation?
>
> Regards
> Jorge Alonso Garcia
>
>