Hi Ahmet,

We could specifically treat .aspx files specially, so that they are
considered to never have any content.  But are there cases where
someone might want to index any content that these URLs might return?
Specifically, what do .aspx "files" typically contain, when found in a
SharePoint hierarchy?

Karl

On Mon, Jan 14, 2013 at 11:37 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
> Hi Karl,
>
> Now 39 aspx files (out of 130) are indexed. Job didn't get killed. No 
> exceptions in the log.
>
> I increased the maximum POST size of solr/jetty but that 39 number didn't 
> increased.
>
> I will check the size of remaining 130 - 39 *.aspx files.
>
> Actually I am mapping extracted content of this aspx files to a ignored 
> dynamic field. (fmap.content=content_ignored) I don't use them. I am only 
> interested in metadata of these aspx files. It would be great if there is a 
> setting  to just grab metadata. Similar to Lists.
>
> Thanks,
> Ahmet
>
> --- On Mon, 1/14/13, Karl Wright <daddy...@gmail.com> wrote:
>
>> From: Karl Wright <daddy...@gmail.com>
>> Subject: Re: Repeated service interruptions - failure processing document: 
>> null
>> To: dev@manifoldcf.apache.org
>> Date: Monday, January 14, 2013, 5:46 PM
>> I checked in a fix for this ticket on
>> trunk.  Please let me know if it
>> resolves this issue.
>>
>> Karl
>>
>> On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <daddy...@gmail.com>
>> wrote:
>> > This is because httpclient is retrying on error for
>> three times by
>> > default.  This has to be disabled in the Solr
>> connector, or the rest
>> > of the logic won't work right.
>> >
>> > I've opened a ticket (CONNECTORS-610) for this problem
>> too.
>> >
>> > Karl
>> >
>> > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <iori...@yahoo.com>
>> wrote:
>> >> Hi Karl,
>> >>
>> >> Thanks for quick fix.
>> >>
>> >> I am still seeing the following error after 'svn
>> up' and 'ant build'
>> >>
>> >> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') -
>> Exception tossed: Repeated service interruptions - failure
>> processing document: null
>> >>
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> Repeated service interruptions - failure processing
>> document: null
>> >>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> >> Caused by:
>> org.apache.http.client.ClientProtocolException
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>> >>         at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> >>         at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>         at
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> >>         at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
>> >> Caused by:
>> org.apache.http.client.NonRepeatableRequestException: Cannot
>> retry request with a non-repeatable request entity.
>> The cause lists the reason the original request failed.
>> >>         at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>> >>         at
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>> >>         ... 6 more
>> >> Caused by: java.net.SocketException: Broken pipe
>> >>         at
>> java.net.SocketOutputStream.socketWrite0(Native Method)
>> >>         at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> >>         at
>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> >>         at
>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>> >>         at
>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>> >>         at
>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>> >>         at
>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>> >>         at
>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>> >>         at
>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>> >>         at
>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>> >>         at
>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>> >>         at
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>> >>         at
>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>> >>         at
>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>> >>         at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>> >>         ... 8 more
>> >>
>> >>
>> >>
>> >> --- On Mon, 1/14/13, Karl Wright <daddy...@gmail.com>
>> wrote:
>> >>
>> >>> From: Karl Wright <daddy...@gmail.com>
>> >>> Subject: Re: Repeated service interruptions -
>> failure processing document: null
>> >>> To: dev@manifoldcf.apache.org
>> >>> Date: Monday, January 14, 2013, 3:30 PM
>> >>> Hi Ahmet,
>> >>>
>> >>> The exception that seems to be causing the
>> abort is a socket
>> >>> exception
>> >>> coming from a socket write:
>> >>>
>> >>> > Caused by: java.net.SocketException:
>> Broken pipe
>> >>>
>> >>> This makes sense in light of the http code
>> returned from
>> >>> Solr, which
>> >>> was 413:  http://www.checkupdown.com/status/E413.html .
>> >>>
>> >>> So there is nothing actually *wrong* with the
>> .aspx
>> >>> documents, but
>> >>> they are just way too big, and Solr is
>> rejecting them for
>> >>> that reason.
>> >>>
>> >>> Clearly, though, the Solr connector should
>> recognize this
>> >>> code as
>> >>> meaning "never retry", so instead of killing
>> the job, it
>> >>> should just
>> >>> skip the document.  I'll open a ticket for
>> that now.
>> >>>
>> >>> Karl
>> >>>
>> >>>
>> >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan
>> <iori...@yahoo.com>
>> >>> wrote:
>> >>> > Hello,
>> >>> >
>> >>> > I am indexing a SharePoint 2010 instance
>> using
>> >>> mcf-trunk (At revision 1432907)
>> >>> >
>> >>> > There is no problem with a Document
>> library that
>> >>> contains word excel etc.
>> >>> >
>> >>> > However, I receive the following errors
>> with a Document
>> >>> library that has *.aspx files in it.
>> >>> >
>> >>> > Status of Jobs => Error: Repeated
>> service
>> >>> interruptions - failure processing document:
>> null
>> >>> >
>> >>> >  WARN 2013-01-14 15:00:12,720 (Worker
>> thread '13')
>> >>> - Service interruption reported for job
>> 1358009105156
>> >>> connection 'iknow': IO exception during
>> indexing: null
>> >>> > ERROR 2013-01-14 15:00:12,763 (Worker
>> thread '13') -
>> >>> Exception tossed: Repeated service
>> interruptions - failure
>> >>> processing document: null
>> >>> >
>> >>>
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> >>> Repeated service interruptions - failure
>> processing
>> >>> document: null
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> >>> > Caused by:
>> >>> org.apache.http.client.ClientProtocolException
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>> >>> >         at
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> >>> >         at
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>> >         at
>> >>>
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
>> >>> > Caused by:
>> >>>
>> org.apache.http.client.NonRepeatableRequestException:
>> Cannot
>> >>> retry request with a non-repeatable request
>> entity.
>> >>> The cause lists the reason the original request
>> failed.
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>> >>> >         ...
>> 6 more
>> >>> > Caused by: java.net.SocketException:
>> Broken pipe
>> >>> >         at
>> >>> java.net.SocketOutputStream.socketWrite0(Native
>> Method)
>> >>> >         at
>> >>>
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> >>> >         at
>> >>>
>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> >>> >         at
>> >>>
>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>> >>> >         at
>> >>>
>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>> >>> >         at
>> >>>
>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>> >>> >         at
>> >>>
>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>> >>> >         at
>> >>>
>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>> >>> >         at
>> >>>
>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>> >>> >         at
>> >>>
>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>> >>> >         at
>> >>>
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>> >>> >         at
>> >>>
>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>> >>> >         at
>> >>>
>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>> >>> >         ...
>> 8 more
>> >>> >
>> >>> > Status of Jobs => Error: Unhandled Solr
>> exception
>> >>> during indexing (0): Server at http://localhost:8983/solr/all returned 
>> >>> non ok
>> >>> status:413, message:FULL head
>> >>> >
>> >>> >
>>    ERROR 2013-01-14
>> >>> 15:10:42,074 (Worker thread '15') - Exception
>> tossed:
>> >>> Unhandled Solr exception during indexing (0):
>> Server at http://localhost:8983/solr/all returned
>> non ok
>> >>> status:413, message:FULL head
>> >>> >
>> >>>
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> >>> Unhandled Solr exception during indexing (0):
>> Server at http://localhost:8983/solr/all returned
>> non ok
>> >>> status:413, message:FULL head
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>> >>> >
>> >>> > On the solr side I see :
>> >>> >
>> >>> > INFO: Creating new http client,
>> >>>
>> config:maxConnections=200&maxConnectionsPerHost=8
>> >>> > 2013-01-14
>> 15:18:21.775:WARN:oejh.HttpParser:Full
>> >>>
>> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
>> >>> ...long long chars ... 2B656B6970{}
>> >>> >
>> >>> > Thanks,
>> >>> > Ahmet
>> >>>
>>

Reply via email to