Re: Sharepoint connector help : site didn't exist or external
Excellent news! Thanks for the update. Karl On Mon, Oct 8, 2018 at 1:54 PM Susheel Kumar wrote: > Thank you so much Karl. I was able to crawl the site and index them. > > On Wed, Oct 3, 2018 at 3:31 PM Karl Wright wrote: > >> Please read the user documentation for the sharepoint connector very >> carefully. You will need a site rule AND a path rule. >> >> Thanks, >> Karl >> >> >> On Wed, Oct 3, 2018 at 3:29 PM Susheel Kumar >> wrote: >> >>> Hi Karl, >>> >>> Please ignore my previous message. I was just able to crawl it but for >>> the files which i wanted to get extracted it is showing me below message. I >>> already had Path configured to have /content included file and library type >>> included but it is still not including them. How to correctly define path >>> in order to get them included. >>> >>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: >>> Checking whether to include document >>> '/Content/Review_Communication_template.docx' >>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: File >>> path '/Content/Review_Communication_template.docx' does not match any rules >>> - excluding >>> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: >>> Checking whether to include document '/Content/Review_Template.pptx' >>> >>> On Wed, Oct 3, 2018 at 2:48 PM Susheel Kumar >>> wrote: >>> Thank you so much, Karl and taking me up to here. I am able to see connector loggings now. The next I am struggling with, when I run a job to use sharepoint repository and output to local file system, I see sometime 401 Unauthorized for usergroup.asmx OR 404 for lists.asmx or 404 for Permission.asmx when I am running same job again and again. I am able to access these web service thru POSTMAN and it works. Any hint what may be missing? I already had manifold Sharepoint plugin installed. DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - Enter: SOAPPart::saveChanges DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "POST /sites/mysite/_vti_bin/usergroup.asmx HTTP/1.1[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "Content-Type: text/xml; charset=utf-8[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "Accept: */*[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "SOAPAction: " http://schemas.microsoft.com/sharepoint/soap/directory/GetUserCollectionFromGroup "[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "User-Agent: Axis/1.4[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "Content-Length: 427[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "Host: dit.apps.com[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "Connection: Keep-Alive[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "Accept-Encoding: gzip,deflate[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "Authorization: NTLM TlRMTVNTUAADGAAYAEgAAADOAM4AYAQABAAuAQAADgAOADIBAAAeAB4AQAEAAABeAQAABYKIogUBKAoPMhvCQLxXxh1vp0BTXWvSa7dE6WMG25EYSBDigKSFQ4YXcg/4Gs4bOgEBYHUY7j1b1AFjszvpmEMnhAACAAQARQBTAAEAFgBDAEQATABEAEUAVgBFAFMAUgAwADIABAAaAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0AAwAyAEMARABMAEQARQBWAEUAUwBSADAAMgAuAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0ABQAUAEEARAAuAEEARABQAC4AYwBvAG0ABwAIAMWOHe89W9QBAABFAFMAawB1AG0AYQByAHMANQBSAE8AUwBFAEwAQwBEAFYAMAAwADAAMQBMAEoAQwA=[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "[\r][\n]" DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsd=" http://www.w3.org/2001/XMLSchema; xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance; xmlns="http://schemas.microsoft.com/sharepoint/soap/directory/;>Excel Services Viewers" DEBUG 2018-10-03T13:24:24,806 (Thread-49328) - http-outgoing-90 << "HTTP/1.1 401 Unauthorized[\r][\n]" OR DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> "POST /sites/mysite/_vti_bin/lists.asmx HTTP/1.1[\r][\n]" DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> "Content-Type: text/xml; charset=utf-8[\r][\n]" DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> "Accept: */*[\r][\n]" DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> "SOAPAction: " http://schemas.microsoft.com/sharepoint/soap/GetListCollection "[\r][\n]" DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> "User-Agent: Axis/1.4[\r][\n]" DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> "Content-Length: 335[\r][\n]"
Re: Sharepoint connector help : site didn't exist or external
Thank you so much Karl. I was able to crawl the site and index them. On Wed, Oct 3, 2018 at 3:31 PM Karl Wright wrote: > Please read the user documentation for the sharepoint connector very > carefully. You will need a site rule AND a path rule. > > Thanks, > Karl > > > On Wed, Oct 3, 2018 at 3:29 PM Susheel Kumar > wrote: > >> Hi Karl, >> >> Please ignore my previous message. I was just able to crawl it but for >> the files which i wanted to get extracted it is showing me below message. I >> already had Path configured to have /content included file and library type >> included but it is still not including them. How to correctly define path >> in order to get them included. >> >> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: Checking >> whether to include document '/Content/Review_Communication_template.docx' >> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: File >> path '/Content/Review_Communication_template.docx' does not match any rules >> - excluding >> DEBUG 2018-10-03T15:22:58,724 (Worker thread '40') - SharePoint: Checking >> whether to include document '/Content/Review_Template.pptx' >> >> On Wed, Oct 3, 2018 at 2:48 PM Susheel Kumar >> wrote: >> >>> Thank you so much, Karl and taking me up to here. I am able to see >>> connector loggings now. >>> >>> The next I am struggling with, when I run a job to use sharepoint >>> repository and output to local file system, I see sometime 401 Unauthorized >>> for usergroup.asmx OR 404 for lists.asmx or 404 for Permission.asmx when I >>> am running same job again and again. I am able to access these web service >>> thru POSTMAN and it works. >>> >>> Any hint what may be missing? I already had manifold Sharepoint plugin >>> installed. >>> >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - Enter: >>> SOAPPart::saveChanges >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> "POST >>> /sites/mysite/_vti_bin/usergroup.asmx HTTP/1.1[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "Content-Type: text/xml; charset=utf-8[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "Accept: */*[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "SOAPAction: " >>> http://schemas.microsoft.com/sharepoint/soap/directory/GetUserCollectionFromGroup >>> "[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "User-Agent: Axis/1.4[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "Content-Length: 427[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "Host: dit.apps.com[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "Connection: Keep-Alive[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "Accept-Encoding: gzip,deflate[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "Authorization: NTLM >>> TlRMTVNTUAADGAAYAEgAAADOAM4AYAQABAAuAQAADgAOADIBAAAeAB4AQAEAAABeAQAABYKIogUBKAoPMhvCQLxXxh1vp0BTXWvSa7dE6WMG25EYSBDigKSFQ4YXcg/4Gs4bOgEBYHUY7j1b1AFjszvpmEMnhAACAAQARQBTAAEAFgBDAEQATABEAEUAVgBFAFMAUgAwADIABAAaAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0AAwAyAEMARABMAEQARQBWAEUAUwBSADAAMgAuAEUAUwAuAEEARAAuAEEARABQAC4AYwBvAG0ABQAUAEEARAAuAEEARABQAC4AYwBvAG0ABwAIAMWOHe89W9QBAABFAFMAawB1AG0AYQByAHMANQBSAE8AUwBFAEwAQwBEAFYAMAAwADAAMQBMAEoAQwA=[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "[\r][\n]" >>> DEBUG 2018-10-03T13:24:24,631 (Thread-49328) - http-outgoing-90 >> >>> "http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsd=" >>> http://www.w3.org/2001/XMLSchema; xmlns:xsi=" >>> http://www.w3.org/2001/XMLSchema-instance;>>> xmlns="http://schemas.microsoft.com/sharepoint/soap/directory/;>Excel >>> Services >>> Viewers" >>> DEBUG 2018-10-03T13:24:24,806 (Thread-49328) - http-outgoing-90 << >>> "HTTP/1.1 401 Unauthorized[\r][\n]" >>> >>> OR >>> >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "POST /sites/mysite/_vti_bin/lists.asmx HTTP/1.1[\r][\n]" >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "Content-Type: text/xml; charset=utf-8[\r][\n]" >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "Accept: */*[\r][\n]" >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "SOAPAction: " >>> http://schemas.microsoft.com/sharepoint/soap/GetListCollection"[\r][\n]; >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "User-Agent: Axis/1.4[\r][\n]" >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "Content-Length: 335[\r][\n]" >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "Host: dit.apps.com[\r][\n]" >>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >>> "Connection: Keep-Alive[\r][\n]"
Re: Query to get the number of documents processed from PostgreSQL
If you want all the documents for a specific job, the query is: select count(*) from jobqueue where jobid= Karl On Mon, Oct 8, 2018 at 4:23 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > > I am currently facing the need of getting the number of documents > processed by MCF in a specific job. > This number is getting bigger than the limit set for the web interface > and i don't want to increase this limit because of the stress it will > put on the database (openning the tab in the UI will pop queries for all > the jobs, and I know from previous readings that these queries are heavy > to process for postgre). > Thus i would like to know if you can provide me with the query used in > the interface to display the number of processed documents so that i can > fire it to postgreSQL manually and request it only for the job i am > interested in; lowering the impact on postgre. > > Thanks for your help. > Romaric > > -- > Romaric Pighetti > France Labs – Les experts du Search > > Les créateurs de Datafari 4, LA solution de recherche pour entreprise > > www.francelabs.com > >
Query to get the number of documents processed from PostgreSQL
Hi Karl, I am currently facing the need of getting the number of documents processed by MCF in a specific job. This number is getting bigger than the limit set for the web interface and i don't want to increase this limit because of the stress it will put on the database (openning the tab in the UI will pop queries for all the jobs, and I know from previous readings that these queries are heavy to process for postgre). Thus i would like to know if you can provide me with the query used in the interface to display the number of processed documents so that i can fire it to postgreSQL manually and request it only for the job i am interested in; lowering the impact on postgre. Thanks for your help. Romaric -- Romaric Pighetti France Labs – Les experts du Search Les créateurs de Datafari 4, LA solution de recherche pour entreprise www.francelabs.com