Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
I have tried now the latest code and now the crawling process is terminating correctly without any exception. But I noticed that when the job is running, the total number of documents are correctly shown, and you can see that this number is incrementing gradually until the end of the job. But at the end of the job the number of documents shown in the UI returns a value equals to the number of libraries that you have configured. In the document history I only see three libraries shown in the Document Status view: - / - /My Custom Library 1// - /My Custom Library 2// Is this correct? Hope this helps. Piergiorgio 2012/9/6 Karl Wright daddy...@gmail.com I also just updated the plugin at sharepoint-2010/trunk to precisely follow a Microsoft example I found. Could you give this a try as well? Karl On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com wrote: Thanks for trying this. Just as a check I increased the number of documents that will be requested on the connector side to 1. If you synch up trunk and try again, it should give me an idea whether the failing logic is on the connector side or the server side. (I suspect it is still the server side, which means that SharePoint's pagination is not working, but let's confirm that.) Thanks, Karl On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote: I checked in a fix for the pagination issue in integration/sharepoint-2010/trunk. Care to build the plugin, deploy it, and let me know if it works now? Hi, svnversion 1381729 still index 1000 docs. Documents: 1002 Active: 0 Processed: 1002 Ahmet -- Piergiorgio Lucidi http://www.open4dev.com
Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
It looks like the FileRef field includes the site path. I've checked in a modification that should help. Karl On Fri, Sep 7, 2012 at 11:34 AM, Piergiorgio Lucidi piergior...@apache.org wrote: I have tried now the latest code and now the crawling process is terminating correctly without any exception. But I noticed that when the job is running, the total number of documents are correctly shown, and you can see that this number is incrementing gradually until the end of the job. But at the end of the job the number of documents shown in the UI returns a value equals to the number of libraries that you have configured. In the document history I only see three libraries shown in the Document Status view: - / - /My Custom Library 1// - /My Custom Library 2// Is this correct? Hope this helps. Piergiorgio 2012/9/6 Karl Wright daddy...@gmail.com I also just updated the plugin at sharepoint-2010/trunk to precisely follow a Microsoft example I found. Could you give this a try as well? Karl On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com wrote: Thanks for trying this. Just as a check I increased the number of documents that will be requested on the connector side to 1. If you synch up trunk and try again, it should give me an idea whether the failing logic is on the connector side or the server side. (I suspect it is still the server side, which means that SharePoint's pagination is not working, but let's confirm that.) Thanks, Karl On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote: I checked in a fix for the pagination issue in integration/sharepoint-2010/trunk. Care to build the plugin, deploy it, and let me know if it works now? Hi, svnversion 1381729 still index 1000 docs. Documents: 1002 Active: 0 Processed: 1002 Ahmet -- Piergiorgio Lucidi http://www.open4dev.com
Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
Hello, Same here, with svnversion 1382074, all 7888 documents indexed without any exception. Ahmet --- On Fri, 9/7/12, Piergiorgio Lucidi piergior...@apache.org wrote: From: Piergiorgio Lucidi piergior...@apache.org Subject: Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0 To: dev@manifoldcf.apache.org Date: Friday, September 7, 2012, 6:34 PM I have tried now the latest code and now the crawling process is terminating correctly without any exception. But I noticed that when the job is running, the total number of documents are correctly shown, and you can see that this number is incrementing gradually until the end of the job. But at the end of the job the number of documents shown in the UI returns a value equals to the number of libraries that you have configured. In the document history I only see three libraries shown in the Document Status view: - / - /My Custom Library 1// - /My Custom Library 2// Is this correct? Hope this helps. Piergiorgio 2012/9/6 Karl Wright daddy...@gmail.com I also just updated the plugin at sharepoint-2010/trunk to precisely follow a Microsoft example I found. Could you give this a try as well? Karl On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com wrote: Thanks for trying this. Just as a check I increased the number of documents that will be requested on the connector side to 1. If you synch up trunk and try again, it should give me an idea whether the failing logic is on the connector side or the server side. (I suspect it is still the server side, which means that SharePoint's pagination is not working, but let's confirm that.) Thanks, Karl On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote: I checked in a fix for the pagination issue in integration/sharepoint-2010/trunk. Care to build the plugin, deploy it, and let me know if it works now? Hi, svnversion 1381729 still index 1000 docs. Documents: 1002 Active: 0 Processed: 1002 Ahmet -- Piergiorgio Lucidi http://www.open4dev.com
[WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
I conclude that the plugin is not handling paging properly - there's no other explanation. So I am canceling the vote and will try to check in a fix. Karl On Thu, Sep 6, 2012 at 1:11 PM, Karl Wright daddy...@gmail.com wrote: It looks like two problems here. First, it looks like Solr is throwing a 500 error for at least one of the documents in your set. However, the fact that you only get 1000 documents indexed also shows that the code is still broken in some way. I will check into whether this looks like a problem in the connector or in the plugin. Karl On Thu, Sep 6, 2012 at 1:06 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Karl, With a document library that 7,888 items, I setup a crawl with mcf-trunk. Sometimes I get this exception : Error: Repeated service interruptions - failure processing document: Ingestion HTTP error code 500 If i don't get exception only 1000 docs are indexed. ERROR 2012-09-06 19:55:13,587 (Worker thread '30') - Exception tossed: Repeated service interruptions - failure processing document: Ingestion HTTP error code 500 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Ingestion HTTP error code 500 at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException: Ingestion HTTP error code 500 at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:1386) Ahmet --- On Wed, 9/5/12, Karl Wright daddy...@gmail.com wrote: From: Karl Wright daddy...@gmail.com Subject: [VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0 To: dev dev@manifoldcf.apache.org Date: Wednesday, September 5, 2012, 11:52 PM Vote +1 if you think the Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0 is ready for release. The release artifact can be found at: http://people.apache.org/~kwright/apache-manifoldcf-sharepoint-2010-plugin-0.1 . There is also a release tag at: https://svn.apache.org/repos/asf/manifoldcf/integration/sharepoint-2010/tags/release-0.1-RC0 Karl
Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
I tried now upgrading the plugin on SharePoint and rebuilding the connector from the trunk but it returns me this exception: ERROR 2012-09-06 20:20:00,993 (Worker thread '41') - Exception tossed: Internal error: Relative path 'Library Custom/actions-article-v2.pdf' was expected to start with '/my/personal/administrator/demosite' org.apache.manifoldcf.core.interfaces.ManifoldCFException: Internal error: Relative path 'Library Custom/actions-article-v2.pdf' was expected to start with '/my/personal/administrator/demosite' at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getChildren(SPSProxyHelper.java:655) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1303) at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:561) It seems that there is an issue about the URL again. Hope this helps. Piergiorgio 2012/9/6 Karl Wright daddy...@gmail.com I checked in a fix for the pagination issue in integration/sharepoint-2010/trunk. Care to build the plugin, deploy it, and let me know if it works now? Karl On Thu, Sep 6, 2012 at 1:26 PM, Karl Wright daddy...@gmail.com wrote: I conclude that the plugin is not handling paging properly - there's no other explanation. So I am canceling the vote and will try to check in a fix. Karl On Thu, Sep 6, 2012 at 1:11 PM, Karl Wright daddy...@gmail.com wrote: It looks like two problems here. First, it looks like Solr is throwing a 500 error for at least one of the documents in your set. However, the fact that you only get 1000 documents indexed also shows that the code is still broken in some way. I will check into whether this looks like a problem in the connector or in the plugin. Karl On Thu, Sep 6, 2012 at 1:06 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Karl, With a document library that 7,888 items, I setup a crawl with mcf-trunk. Sometimes I get this exception : Error: Repeated service interruptions - failure processing document: Ingestion HTTP error code 500 If i don't get exception only 1000 docs are indexed. ERROR 2012-09-06 19:55:13,587 (Worker thread '30') - Exception tossed: Repeated service interruptions - failure processing document: Ingestion HTTP error code 500 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Ingestion HTTP error code 500 at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException: Ingestion HTTP error code 500 at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:1386) Ahmet --- On Wed, 9/5/12, Karl Wright daddy...@gmail.com wrote: From: Karl Wright daddy...@gmail.com Subject: [VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0 To: dev dev@manifoldcf.apache.org Date: Wednesday, September 5, 2012, 11:52 PM Vote +1 if you think the Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0 is ready for release. The release artifact can be found at: http://people.apache.org/~kwright/apache-manifoldcf-sharepoint-2010-plugin-0.1 . There is also a release tag at: https://svn.apache.org/repos/asf/manifoldcf/integration/sharepoint-2010/tags/release-0.1-RC0 Karl -- Piergiorgio Lucidi http://www.open4dev.com
Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
Thanks for trying this. Just as a check I increased the number of documents that will be requested on the connector side to 1. If you synch up trunk and try again, it should give me an idea whether the failing logic is on the connector side or the server side. (I suspect it is still the server side, which means that SharePoint's pagination is not working, but let's confirm that.) Thanks, Karl On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote: I checked in a fix for the pagination issue in integration/sharepoint-2010/trunk. Care to build the plugin, deploy it, and let me know if it works now? Hi, svnversion 1381729 still index 1000 docs. Documents: 1002 Active: 0 Processed: 1002 Ahmet
Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
I also just updated the plugin at sharepoint-2010/trunk to precisely follow a Microsoft example I found. Could you give this a try as well? Karl On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com wrote: Thanks for trying this. Just as a check I increased the number of documents that will be requested on the connector side to 1. If you synch up trunk and try again, it should give me an idea whether the failing logic is on the connector side or the server side. (I suspect it is still the server side, which means that SharePoint's pagination is not working, but let's confirm that.) Thanks, Karl On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote: I checked in a fix for the pagination issue in integration/sharepoint-2010/trunk. Care to build the plugin, deploy it, and let me know if it works now? Hi, svnversion 1381729 still index 1000 docs. Documents: 1002 Active: 0 Processed: 1002 Ahmet