FWIW, I've chased this upstream in Solr to the following code, in
SolrRequestParsers.java, which creates disk-file-based streams from the content
part of individual multipart form requests:
>>>>>>
// Create a factory for disk-based file items
DiskFileItemFactory factory = new DiskFileItemFactory();
// Set factory constraints
// TODO - configure factory.setSizeThreshold(yourMaxMemorySize);
// TODO - configure factory.setRepository(yourTempDirectory);
// Create a new file upload handler
ServletFileUpload upload = new ServletFileUpload(factory);
upload.setSizeMax( uploadLimitKB*1024 );
// Parse the request
List items = upload.parseRequest(req);
Iterator iter = items.iterator();
while (iter.hasNext()) {
FileItem item = (FileItem) iter.next();
// If its a form field, put it in our parameter map
if (item.isFormField()) {
MultiMapSolrParams.addParam(
item.getFieldName(),
item.getString(), params.getMap() );
}
// Only add it if it actually has something...
else if( item.getSize() > 0 ) {
streams.add( new FileItemContentStream( item ) );
}
}
<<<<<<
It seems to me that there's opportunity here for Solr to proliferate temporary
disk files until a directory limit is reached, if there is inadequate cleanup
done after processing. I have not yet managed to either research the cleanup
end, or run experiments to verify this scenario, but if I correctly recall how
DiskFileItemFactory works, it creates files and registers them to be cleaned up
on JVM exit. If that's the only cleanup, that's not going to cut it for a
real-world system.
Karl
-----Original Message-----
From: Wright Karl (Nokia-S/Cambridge)
Sent: Monday, June 07, 2010 6:21 PM
To: [email protected]
Subject: Re: Solr spewage and dropped documents, while indexing
Thanks for the confirmation. I am not sure I buy the 'just can't keep up'
explanation though. The load on this box during this test looks reasonable and
there is no sign of thrashing etc. So I am inclined to believe either there's
a server timeout that is pathologically short, or it is a race condition of
some kind.
Karl
--- original message ---
From: "ext Thijs Vonk" <[email protected]>
Subject: Re: Solr spewage and dropped documents, while indexing
Date: June 7, 2010
Time: 5:46:36 PM
Hi Karl,
I'm just a novice Solr user, but I've seen this error some time ago, it
usually happend when I was pushing my machine to hard (the machine that
was gathering the documents to be send to solr). I was using to many
threads on the machine, and now that I've reduced those to a more
reasonable amount I haven't seen it. It was as if the machine just
didn't have enough CPU power to actually send everything over the wire.
I read that you had about 50 threads running. Depending an what they are
doing it seems like a lot, but that all depends totally on the hardware
you're running on I guess.
Thijs
On 6/7/10 5:30 PM, [email protected] wrote:
> Hi Simon,
>
> The same data was successfully indexed with one thread over the weekend. So
> it is unlikely to be a data issue.
>
> It turns out there was one of the content-missing exceptions far back in the
> screen buffer, but only one:
>
> Jun 7, 2010 9:57:48 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: missing content stream
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> ntentStreamHandlerBase.java:49)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> erBase.java:131)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
> Request(RequestHandlers.java:233)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1321)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
> .java:341)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> r.java:244)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
> Handler.java:1089)
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 65)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 81)
> at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 12)
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
> lerCollection.java:211)
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
> java:114)
> at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
> 39)
> at org.mortbay.jetty.Server.handle(Server.java:285)
> at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50
> 2)
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
> n.java:835)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
> java:226)
> at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool
> .java:442)
>
> So it seems like Solr is behaving consistently with it being convinced of
> missing content.
>
> I am actually wondering if the problem is occurring because so much is
> happening at the same time we're seeing a socket timeout for some of the
> requests. If the socket times out and closes, then I could well imagine that
> Solr would toss this error. Thoughts?
>
> Karl
>
>
> -----Original Message-----
> From: ext Simon Willnauer [mailto:[email protected]]
> Sent: Monday, June 07, 2010 11:20 AM
> To: [email protected]
> Subject: Re: Solr spewage and dropped documents, while indexing
>
> Karl, the HTTP error lines are produced by your code right?! Can you
> provide what has been returned by Solr?
> If that would be related to any server side problem described above
> like no sockets or so you would not see a 400! I could also imagine
> that the documents you are sending are empty - is that something which
> could have happened?
>
> simon
>
> On Mon, Jun 7, 2010 at 5:05 PM,<[email protected]> wrote:
>
>> Perhaps - although missing_content_stream seems to imply that it had at
>> least partly read 4 requests which later failed. Also, wouldn't there be
>> something in the output log which would give us a clue as to what happened?
>>
>> Is there any post-hiccup spelunking I can reasonably do? Or should I try to
>> reproduce the problem with more diagnostics on?
>>
>> Karl
>>
>>
>> -----Original Message-----
>> From: ext Bernd Fondermann [mailto:[email protected]]
>> Sent: Monday, June 07, 2010 10:54 AM
>> To: [email protected]
>> Subject: Re: Solr spewage and dropped documents, while indexing
>>
>> Looks like a server-side problem to me.
>> Maybe the server ran out of sockets or other resources and just replied
>> with a 400 error?
>>
>> Bernd
>>
>> [email protected] wrote:
>>
>>> Hi folks,
>>>
>>> This morning I was experimenting with using multiple threads while indexing
>>> some 20,000,000 records worth of content. In fact, my test spun up some 50
>>> threads, and happily chugged away for a couple of hours before I saw the
>>> following output from my test code:
>>>
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6469124
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6469551
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6470592
>>> Http protocol error: HTTP/1.1 400 missing_content_stream, while trying to
>>> index record 6472454
>>> java.net.SocketException: Connection reset
>>> at java.net.SocketInputStream.read(SocketInputStream.java:168)
>>> at HttpPoster.getResponse(HttpPoster.java:280)
>>> at HttpPoster.indexPost(HttpPoster.java:191)
>>> at ParseAndLoad$PostThread.run(ParseAndLoad.java:638)
>>> <<<<<<
>>>
>>> Looking at the solr-side output, I see nothing interesting at all:
>>>
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.78518981933594&literal.nokia_phone=%2B497971910474&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_district=Münster&literal.nokia_placerating=0&literal.id=6472724&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=1&literal.nokia_ppid=276u0wyw-c8cb7f4d6cd84a639a4e7d3570bf8814&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9985514322917&literal.nokia_postalcode=74405&literal.nokia_street=WeinhaldenstraÃe&literal.nokia_title=Dorfgemeinschaft+Münster+e.V.&literal.nokia_category=261}
>>> status=0 QTime=1
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.76717020670573&literal.nokia_phone=%2B497971950725&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=0&literal.id=6472737&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=13&literal.nokia_ppid=276u0wyw-d3bed6449fcb41b0adc50ae08e041f8d&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9974405924479&literal.nokia_fax=%2B497971950712&literal.nokia_postalcode=74405&literal.nokia_street=KochstraÃe&literal.nokia_title=BayWa+AG+Bau-+%26+Gartenmarkt&literal.nokia_category=194}
>>> status=0 QTime=0
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.77591044108073&literal.nokia_phone=%2B49797124009&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_district=Unterrot&literal.nokia_placerating=0&literal.id=6472739&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=28&literal.nokia_ppid=276u0wyw-d534d7a9235a4edf878d5e32a34bad8b&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9791788736979&literal.nokia_fax=%2B49797123431&literal.nokia_postalcode=74405&literal.nokia_street=HauptstraÃe&literal.nokia_title=Gastel+R.&literal.nokia_category=5}
>>> status=0 QTime=1
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.76935&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=5&literal.id=6472698&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=15&literal.nokia_ppid=276u0wyw-9544100e68d74162aff54783b9376134&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9981&literal.nokia_postalcode=74405&literal.nokia_street=KanzleistraÃe&literal.nokia_tag=Steuerberater&literal.nokia_tag=Business+%26+Service&literal.nokia_title=Consultis+GmbH&literal.nokia_category=215}
>>> status=0 QTime=92
>>> Jun 7, 2010 9:57:48 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/update/extract
>>> params={literal.nokia_longitude=9.77173970540364&literal.nokia_phone=%2B4979713238&literal.nokia_type=0&literal.nokia_boost=1&literal.nokia_placerating=0&literal.id=6472699&literal.nokia_visitcount=0&literal.nokia_country=DEU&literal.nokia_housenumber=37&literal.nokia_ppid=276u0wyw-9600016fd0d248c9b442111838350f64&literal.nokia_language=de&literal.nokia_city=Gaildorf&literal.nokia_latitude=48.9987182617188&literal.nokia_fax=%2B497971911639&literal.nokia_postalcode=74405&literal.nokia_street=KarlstraÃe&literal.nokia_title=Videothek,+5th+avenue+Peltekis+Apostolos&literal.nokia_category=5}
>>> status=0 QTime=93
>>> <<<<<<
>>>
>>> It is unlikely (but, of course, not out of the question) that this hiccup
>>> is due to some reentrancy problem in my test code. It is much more likely
>>> to be some kind of a Solr multi-threaded race condition - especially since
>>> it looks like a number of requests all failed at precisely the same time.
>>> This is a Solr 1.5 build from mid-late March, FWIW. Does anyone know of an
>>> extractingUpdateRequestHandler re-entrancy bug of this kind?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]