Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0

2012-09-07 Thread Piergiorgio Lucidi
I have tried now the latest code and now the crawling process is
terminating correctly without any exception.

But I noticed that when the job is running, the total number of documents
are correctly shown, and you can see that this number is incrementing
gradually until the end of the job.

But at the end of the job the number of documents shown in the UI returns a
value equals to the number of libraries that you have configured.

In the document history I only see three libraries shown in the Document
Status view:

- /
- /My Custom Library 1//
- /My Custom Library 2//

Is this correct?

Hope this helps.
Piergiorgio

2012/9/6 Karl Wright daddy...@gmail.com

 I also just updated the plugin at sharepoint-2010/trunk to precisely
 follow a Microsoft example I found.  Could you give this a try as
 well?

 Karl

 On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com wrote:
  Thanks for trying this.
 
  Just as a check I increased the number of documents that will be
  requested on the connector side to 1.  If you synch up trunk and
  try again, it should give me an idea whether the failing logic is on
  the connector side or the server side.  (I suspect it is still the
  server side, which means that SharePoint's pagination is not working,
  but let's confirm that.)
 
  Thanks,
  Karl
 
  On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote:
  I checked in a fix for the pagination
  issue in
  integration/sharepoint-2010/trunk.  Care to build the
  plugin, deploy
  it, and let me know if it works now?
 
  Hi,
 
  svnversion 1381729 still index 1000 docs.
 
  Documents: 1002
  Active: 0
  Processed: 1002
 
  Ahmet

 --
 Piergiorgio Lucidi
 http://www.open4dev.com




Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0

2012-09-07 Thread Karl Wright
It looks like the FileRef field includes the site path.  I've checked
in a modification that should help.
Karl

On Fri, Sep 7, 2012 at 11:34 AM, Piergiorgio Lucidi
piergior...@apache.org wrote:
 I have tried now the latest code and now the crawling process is
 terminating correctly without any exception.

 But I noticed that when the job is running, the total number of documents
 are correctly shown, and you can see that this number is incrementing
 gradually until the end of the job.

 But at the end of the job the number of documents shown in the UI returns a
 value equals to the number of libraries that you have configured.

 In the document history I only see three libraries shown in the Document
 Status view:

 - /
 - /My Custom Library 1//
 - /My Custom Library 2//

 Is this correct?

 Hope this helps.
 Piergiorgio

 2012/9/6 Karl Wright daddy...@gmail.com

 I also just updated the plugin at sharepoint-2010/trunk to precisely
 follow a Microsoft example I found.  Could you give this a try as
 well?

 Karl

 On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com wrote:
  Thanks for trying this.
 
  Just as a check I increased the number of documents that will be
  requested on the connector side to 1.  If you synch up trunk and
  try again, it should give me an idea whether the failing logic is on
  the connector side or the server side.  (I suspect it is still the
  server side, which means that SharePoint's pagination is not working,
  but let's confirm that.)
 
  Thanks,
  Karl
 
  On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote:
  I checked in a fix for the pagination
  issue in
  integration/sharepoint-2010/trunk.  Care to build the
  plugin, deploy
  it, and let me know if it works now?
 
  Hi,
 
  svnversion 1381729 still index 1000 docs.
 
  Documents: 1002
  Active: 0
  Processed: 1002
 
  Ahmet

 --
 Piergiorgio Lucidi
 http://www.open4dev.com




Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0

2012-09-07 Thread Ahmet Arslan
Hello,

Same here, with svnversion 1382074, all 7888 documents indexed without any 
exception.

Ahmet



--- On Fri, 9/7/12, Piergiorgio Lucidi piergior...@apache.org wrote:

 From: Piergiorgio Lucidi piergior...@apache.org
 Subject: Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 
 plugin 0.1 RC0
 To: dev@manifoldcf.apache.org
 Date: Friday, September 7, 2012, 6:34 PM
 I have tried now the latest code and
 now the crawling process is
 terminating correctly without any exception.
 
 But I noticed that when the job is running, the total number
 of documents
 are correctly shown, and you can see that this number is
 incrementing
 gradually until the end of the job.
 
 But at the end of the job the number of documents shown in
 the UI returns a
 value equals to the number of libraries that you have
 configured.
 
 In the document history I only see three libraries shown in
 the Document
 Status view:
 
 - /
 - /My Custom Library 1//
 - /My Custom Library 2//
 
 Is this correct?
 
 Hope this helps.
 Piergiorgio
 
 2012/9/6 Karl Wright daddy...@gmail.com
 
  I also just updated the plugin at sharepoint-2010/trunk
 to precisely
  follow a Microsoft example I found.  Could you
 give this a try as
  well?
 
  Karl
 
  On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com
 wrote:
   Thanks for trying this.
  
   Just as a check I increased the number of
 documents that will be
   requested on the connector side to 1.  If
 you synch up trunk and
   try again, it should give me an idea whether the
 failing logic is on
   the connector side or the server side.  (I
 suspect it is still the
   server side, which means that SharePoint's
 pagination is not working,
   but let's confirm that.)
  
   Thanks,
   Karl
  
   On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan
 iori...@yahoo.com
 wrote:
   I checked in a fix for the pagination
   issue in
   integration/sharepoint-2010/trunk. 
 Care to build the
   plugin, deploy
   it, and let me know if it works now?
  
   Hi,
  
   svnversion 1381729 still index 1000 docs.
  
   Documents: 1002
   Active: 0
   Processed: 1002
  
   Ahmet
 
  --
  Piergiorgio Lucidi
  http://www.open4dev.com
 
 



[WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0

2012-09-06 Thread Karl Wright
I conclude that the plugin is not handling paging properly - there's
no other explanation.  So I am canceling the vote and will try to
check in a fix.

Karl

On Thu, Sep 6, 2012 at 1:11 PM, Karl Wright daddy...@gmail.com wrote:
 It looks like two problems here.  First, it looks like Solr is
 throwing a 500 error for at least one of the documents in your set.

 However, the fact that you only get 1000 documents indexed also shows
 that the code is still broken in some way.  I will check into whether
 this looks like a problem in the connector or in the plugin.

 Karl

 On Thu, Sep 6, 2012 at 1:06 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Karl,

 With a document library that 7,888 items, I setup a crawl with mcf-trunk. 
 Sometimes I get this exception : Error: Repeated service interruptions - 
 failure processing document: Ingestion HTTP error code 500

 If i don't get exception only 1000 docs are indexed.

 ERROR 2012-09-06 19:55:13,587 (Worker thread '30') - Exception tossed: 
 Repeated service interruptions - failure processing document: Ingestion HTTP 
 error code 500
 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
 interruptions - failure processing document: Ingestion HTTP error code 500
 at 
 org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
 Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException: 
 Ingestion HTTP error code 500
 at 
 org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:1386)

 Ahmet
 --- On Wed, 9/5/12, Karl Wright daddy...@gmail.com wrote:

 From: Karl Wright daddy...@gmail.com
 Subject: [VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0
 To: dev dev@manifoldcf.apache.org
 Date: Wednesday, September 5, 2012, 11:52 PM
 Vote +1 if you think the Apache
 ManifoldCF SharePoint 2010 plugin 0.1
 RC0 is ready for release.

 The release artifact can be found at:
 http://people.apache.org/~kwright/apache-manifoldcf-sharepoint-2010-plugin-0.1
 .

 There is also a release tag at:
 https://svn.apache.org/repos/asf/manifoldcf/integration/sharepoint-2010/tags/release-0.1-RC0

 Karl



Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0

2012-09-06 Thread Piergiorgio Lucidi
I tried now upgrading the plugin on SharePoint and rebuilding the connector
from the trunk but it returns me this exception:

ERROR 2012-09-06 20:20:00,993 (Worker thread '41') - Exception tossed:
Internal error: Relative path 'Library Custom/actions-article-v2.pdf' was
expected to start with '/my/personal/administrator/demosite'
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Internal error:
Relative path 'Library Custom/actions-article-v2.pdf' was expected to start
with '/my/personal/administrator/demosite'
 at
org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getChildren(SPSProxyHelper.java:655)
at
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1303)
 at
org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:561)

It seems that there is an issue about the URL again.

Hope this helps.
Piergiorgio

2012/9/6 Karl Wright daddy...@gmail.com

 I checked in a fix for the pagination issue in
 integration/sharepoint-2010/trunk.  Care to build the plugin, deploy
 it, and let me know if it works now?

 Karl


 On Thu, Sep 6, 2012 at 1:26 PM, Karl Wright daddy...@gmail.com wrote:
  I conclude that the plugin is not handling paging properly - there's
  no other explanation.  So I am canceling the vote and will try to
  check in a fix.
 
  Karl
 
  On Thu, Sep 6, 2012 at 1:11 PM, Karl Wright daddy...@gmail.com wrote:
  It looks like two problems here.  First, it looks like Solr is
  throwing a 500 error for at least one of the documents in your set.
 
  However, the fact that you only get 1000 documents indexed also shows
  that the code is still broken in some way.  I will check into whether
  this looks like a problem in the connector or in the plugin.
 
  Karl
 
  On Thu, Sep 6, 2012 at 1:06 PM, Ahmet Arslan iori...@yahoo.com wrote:
  Hi Karl,
 
  With a document library that 7,888 items, I setup a crawl with
 mcf-trunk. Sometimes I get this exception : Error: Repeated service
 interruptions - failure processing document: Ingestion HTTP error code 500
 
  If i don't get exception only 1000 docs are indexed.
 
  ERROR 2012-09-06 19:55:13,587 (Worker thread '30') - Exception tossed:
 Repeated service interruptions - failure processing document: Ingestion
 HTTP error code 500
  org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated
 service interruptions - failure processing document: Ingestion HTTP error
 code 500
  at
 org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
  Caused by: org.apache.manifoldcf.core.interfaces.ManifoldCFException:
 Ingestion HTTP error code 500
  at
 org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:1386)
 
  Ahmet
  --- On Wed, 9/5/12, Karl Wright daddy...@gmail.com wrote:
 
  From: Karl Wright daddy...@gmail.com
  Subject: [VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1
 RC0
  To: dev dev@manifoldcf.apache.org
  Date: Wednesday, September 5, 2012, 11:52 PM
  Vote +1 if you think the Apache
  ManifoldCF SharePoint 2010 plugin 0.1
  RC0 is ready for release.
 
  The release artifact can be found at:
 
 http://people.apache.org/~kwright/apache-manifoldcf-sharepoint-2010-plugin-0.1
  .
 
  There is also a release tag at:
 
 https://svn.apache.org/repos/asf/manifoldcf/integration/sharepoint-2010/tags/release-0.1-RC0
 
  Karl
 

 --
 Piergiorgio Lucidi
 http://www.open4dev.com




Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0

2012-09-06 Thread Karl Wright
Thanks for trying this.

Just as a check I increased the number of documents that will be
requested on the connector side to 1.  If you synch up trunk and
try again, it should give me an idea whether the failing logic is on
the connector side or the server side.  (I suspect it is still the
server side, which means that SharePoint's pagination is not working,
but let's confirm that.)

Thanks,
Karl

On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote:
 I checked in a fix for the pagination
 issue in
 integration/sharepoint-2010/trunk.  Care to build the
 plugin, deploy
 it, and let me know if it works now?

 Hi,

 svnversion 1381729 still index 1000 docs.

 Documents: 1002
 Active: 0
 Processed: 1002

 Ahmet


Re: [WITHDRAW][VOTE] Release Apache ManifoldCF SharePoint 2010 plugin 0.1 RC0

2012-09-06 Thread Karl Wright
I also just updated the plugin at sharepoint-2010/trunk to precisely
follow a Microsoft example I found.  Could you give this a try as
well?

Karl

On Thu, Sep 6, 2012 at 5:08 PM, Karl Wright daddy...@gmail.com wrote:
 Thanks for trying this.

 Just as a check I increased the number of documents that will be
 requested on the connector side to 1.  If you synch up trunk and
 try again, it should give me an idea whether the failing logic is on
 the connector side or the server side.  (I suspect it is still the
 server side, which means that SharePoint's pagination is not working,
 but let's confirm that.)

 Thanks,
 Karl

 On Thu, Sep 6, 2012 at 4:19 PM, Ahmet Arslan iori...@yahoo.com wrote:
 I checked in a fix for the pagination
 issue in
 integration/sharepoint-2010/trunk.  Care to build the
 plugin, deploy
 it, and let me know if it works now?

 Hi,

 svnversion 1381729 still index 1000 docs.

 Documents: 1002
 Active: 0
 Processed: 1002

 Ahmet