Re: Sharepoint connector help : site didn't exist or external

2018-10-08 Thread Karl Wright
Excellent news!
Thanks for the update.

Karl


On Mon, Oct 8, 2018 at 1:54 PM Susheel Kumar  wrote:

> Thank you so much Karl. I was able to crawl the site and index them.
>
> On Wed, Oct 3, 2018 at 3:31 PM Karl Wright  wrote:
>
>> Please read the user documentation for the sharepoint connector very
>> carefully.  You will need a site rule AND a path rule.
>>
>> Thanks,
>> Karl
>>
>>
>> On Wed, Oct 3, 2018 at 3:29 PM Susheel Kumar 
>> wrote:
>>
>>> Hi Karl,
>>>
>>> Please ignore my previous message.  I was just able to crawl it but for
>>> the files which i wanted to get extracted it is showing me below message. I
>>> already had Path configured to have /content included file and library type
>>> included but it is still not including them. How to correctly define path
>>> in order to get them included.
>>>
>>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint:
>>> Checking whether to include document
>>> '/Content/Review_Communication_template.docx'
>>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: File
>>> path '/Content/Review_Communication_template.docx' does not match any rules
>>> - excluding
>>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint:
>>> Checking whether to include document '/Content/Review_Template.pptx'
>>>
>>> On Wed, Oct 3, 2018 at 2:48 PM Susheel Kumar 
>>> wrote:
>>>
 Thank you so much, Karl and taking me up to here. I am able to see
 connector loggings now.

 The next I am struggling with, when I run a job to use sharepoint
 repository and output to local file system, I see sometime 401 Unauthorized
 for usergroup.asmx OR 404 for lists.asmx or 404 for Permission.asmx when I
 am running same job again and again.  I am able to access these web service
 thru POSTMAN and it works.

 Any hint what may be missing? I already had manifold Sharepoint plugin
 installed.

 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - Enter:
 SOAPPart::saveChanges
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "POST /sites/mysite/_vti_bin/usergroup.asmx HTTP/1.1[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "Content-Type: text/xml; charset=utf-8[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "Accept: */*[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "SOAPAction: "
 http://schemas.microsoft.com/sharepoint/soap/directory/GetUserCollectionFromGroup
 "[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "User-Agent: Axis/1.4[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "Content-Length: 427[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "Host: dit.apps.com[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "Connection: Keep-Alive[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "Accept-Encoding: gzip,deflate[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "Authorization: NTLM
 TlRMTVNTUAADGAAYAEgAAADOAM4AYAQABAAuAQAADgAOADIBAAAeAB4AQAEAAABeAQAABYKIogUBKAoPMhvCQLxXxh1vp0BTXWvSa7dE6WMG25EYSBDigKSFQ4YXcg/4Gs4bOgEBYHUY7j1b1AFjszvpmEMnhAACAAQARQBTAAEAFgBDAEQATABEAEUAVgBFAFMAUgAwADIABAAaAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0AAwAyAEMARABMAEQARQBWAEUAUwBSADAAMgAuAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0ABQAUAEEARAAuAEEARABQAC4AYwBvAG0ABwAIAMWOHe89W9QBAABFAFMAawB1AG0AYQByAHMANQBSAE8AUwBFAEwAQwBEAFYAMAAwADAAMQBMAEoAQwA=[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "[\r][\n]"
 DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
 "http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsd="
 http://www.w3.org/2001/XMLSchema; xmlns:xsi="
 http://www.w3.org/2001/XMLSchema-instance; xmlns="http://schemas.microsoft.com/sharepoint/soap/directory/;>Excel
 Services
 Viewers"
 DEBUG 2018-10-03T13:24:24,806 (Thread-49328) - http-outgoing-90 <<
 "HTTP/1.1 401 Unauthorized[\r][\n]"

 OR

 DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
 "POST /sites/mysite/_vti_bin/lists.asmx HTTP/1.1[\r][\n]"
 DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
 "Content-Type: text/xml; charset=utf-8[\r][\n]"
 DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
 "Accept: */*[\r][\n]"
 DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
 "SOAPAction: "
 http://schemas.microsoft.com/sharepoint/soap/GetListCollection
 "[\r][\n]"
 DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
 "User-Agent: Axis/1.4[\r][\n]"
 DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
 "Content-Length: 335[\r][\n]"

Re: Sharepoint connector help : site didn't exist or external

2018-10-08 Thread Susheel Kumar
Thank you so much Karl. I was able to crawl the site and index them.

On Wed, Oct 3, 2018 at 3:31 PM Karl Wright  wrote:

> Please read the user documentation for the sharepoint connector very
> carefully.  You will need a site rule AND a path rule.
>
> Thanks,
> Karl
>
>
> On Wed, Oct 3, 2018 at 3:29 PM Susheel Kumar 
> wrote:
>
>> Hi Karl,
>>
>> Please ignore my previous message.  I was just able to crawl it but for
>> the files which i wanted to get extracted it is showing me below message. I
>> already had Path configured to have /content included file and library type
>> included but it is still not including them. How to correctly define path
>> in order to get them included.
>>
>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: Checking
>> whether to include document '/Content/Review_Communication_template.docx'
>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: File
>> path '/Content/Review_Communication_template.docx' does not match any rules
>> - excluding
>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: Checking
>> whether to include document '/Content/Review_Template.pptx'
>>
>> On Wed, Oct 3, 2018 at 2:48 PM Susheel Kumar 
>> wrote:
>>
>>> Thank you so much, Karl and taking me up to here. I am able to see
>>> connector loggings now.
>>>
>>> The next I am struggling with, when I run a job to use sharepoint
>>> repository and output to local file system, I see sometime 401 Unauthorized
>>> for usergroup.asmx OR 404 for lists.asmx or 404 for Permission.asmx when I
>>> am running same job again and again.  I am able to access these web service
>>> thru POSTMAN and it works.
>>>
>>> Any hint what may be missing? I already had manifold Sharepoint plugin
>>> installed.
>>>
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - Enter:
>>> SOAPPart::saveChanges
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "POST
>>> /sites/mysite/_vti_bin/usergroup.asmx HTTP/1.1[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "Content-Type: text/xml; charset=utf-8[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "Accept: */*[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "SOAPAction: "
>>> http://schemas.microsoft.com/sharepoint/soap/directory/GetUserCollectionFromGroup
>>> "[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "User-Agent: Axis/1.4[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "Content-Length: 427[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "Host: dit.apps.com[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "Connection: Keep-Alive[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "Accept-Encoding: gzip,deflate[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "Authorization: NTLM
>>> TlRMTVNTUAADGAAYAEgAAADOAM4AYAQABAAuAQAADgAOADIBAAAeAB4AQAEAAABeAQAABYKIogUBKAoPMhvCQLxXxh1vp0BTXWvSa7dE6WMG25EYSBDigKSFQ4YXcg/4Gs4bOgEBYHUY7j1b1AFjszvpmEMnhAACAAQARQBTAAEAFgBDAEQATABEAEUAVgBFAFMAUgAwADIABAAaAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0AAwAyAEMARABMAEQARQBWAEUAUwBSADAAMgAuAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0ABQAUAEEARAAuAEEARABQAC4AYwBvAG0ABwAIAMWOHe89W9QBAABFAFMAawB1AG0AYQByAHMANQBSAE8AUwBFAEwAQwBEAFYAMAAwADAAMQBMAEoAQwA=[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "[\r][\n]"
>>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >>
>>> "http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsd="
>>> http://www.w3.org/2001/XMLSchema; xmlns:xsi="
>>> http://www.w3.org/2001/XMLSchema-instance;>>> xmlns="http://schemas.microsoft.com/sharepoint/soap/directory/;>Excel
>>> Services
>>> Viewers"
>>> DEBUG 2018-10-03T13:24:24,806 (Thread-49328) - http-outgoing-90 <<
>>> "HTTP/1.1 401 Unauthorized[\r][\n]"
>>>
>>> OR
>>>
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "POST /sites/mysite/_vti_bin/lists.asmx HTTP/1.1[\r][\n]"
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "Content-Type: text/xml; charset=utf-8[\r][\n]"
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "Accept: */*[\r][\n]"
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "SOAPAction: "
>>> http://schemas.microsoft.com/sharepoint/soap/GetListCollection"[\r][\n];
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "User-Agent: Axis/1.4[\r][\n]"
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "Content-Length: 335[\r][\n]"
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "Host: dit.apps.com[\r][\n]"
>>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>>> "Connection: Keep-Alive[\r][\n]"

Re: Query to get the number of documents processed from PostgreSQL

2018-10-08 Thread Karl Wright
If you want all the documents for a specific job, the query is:

select count(*) from jobqueue where jobid=

Karl


On Mon, Oct 8, 2018 at 4:23 AM Romaric Pighetti <
romaric.pighe...@francelabs.com> wrote:

> Hi Karl,
>
> I am currently facing the need of getting the number of documents
> processed by MCF in a specific job.
> This number is getting bigger than the limit set for the web interface
> and i don't want to increase this limit because of the stress it will
> put on the database (openning the tab in the UI will pop queries for all
> the jobs, and I know from previous readings that these queries are heavy
> to process for postgre).
> Thus i would like to know if you can provide me with the query used in
> the interface to display the number of processed documents so that i can
> fire it to postgreSQL manually and request it only for the job i am
> interested in; lowering the impact on postgre.
>
> Thanks for your help.
> Romaric
>
> --
> Romaric Pighetti
> France Labs – Les experts du Search
>
> Les créateurs de Datafari 4, LA solution de recherche pour entreprise
>
> www.francelabs.com
>
>


Query to get the number of documents processed from PostgreSQL

2018-10-08 Thread Romaric Pighetti

Hi Karl,

I am currently facing the need of getting the number of documents 
processed by MCF in a specific job.
This number is getting bigger than the limit set for the web interface 
and i don't want to increase this limit because of the stress it will 
put on the database (openning the tab in the UI will pop queries for all 
the jobs, and I know from previous readings that these queries are heavy 
to process for postgre).
Thus i would like to know if you can provide me with the query used in 
the interface to display the number of processed documents so that i can 
fire it to postgreSQL manually and request it only for the job i am 
interested in; lowering the impact on postgre.


Thanks for your help.
Romaric

--
Romaric Pighetti
France Labs – Les experts du Search

Les créateurs de Datafari 4, LA solution de recherche pour entreprise

www.francelabs.com