Some results of looking at commons-fileupload... There are 4 source files that have "catch" in them:
./src/java/org/apache/commons/fileupload/MultipartStream.java ./src/java/org/apache/commons/fileupload/util/Streams.java ./src/java/org/apache/commons/fileupload/FileUploadBase.java ./src/java/org/apache/commons/fileupload/disk/DiskFileItem.java Of these 4, it looks like two have the potential for silently eating thrown IOException's: ./src/java/org/apache/commons/fileupload/MultipartStream.java ./src/java/org/apache/commons/fileupload/disk/DiskFileItem.java DiskFileItem can eat exceptions if it gets them upon reading a multipart section from disk. MultipartStream can eat exceptions if there are any I/O errors reading the input stream. I suspect the latter is what might be happening here. If anyone wants to verify this, the code in question first converts IOException errors to MalformedStreamException's. Then, later, it eats most MalformedStreamException's, and treats the stream as being empty. Karl -----Original Message----- From: Wright Karl (Nokia-S/Cambridge) Sent: Thursday, June 10, 2010 1:34 PM To: [email protected] Subject: RE: Solr spewage and dropped documents, while indexing Hmmm. I did a run of the proposed change and it did not help. If anything, the system behaved worse and generated many more 400's than before. So, probably the change is having the intended effect, but the extra file-deletion time is interfering even further with "solr keeping up". Further analysis shows that there are actually two problems. First problem is the fact that perfectly reasonable documents sometimes generate 400's. Second problem is the connection reset (which is what actually kills the client), which could well be due to a socket timeout. The client reports this trace, which simply shows that the post response socket was closed by somebody on the server end: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at HttpPoster.getResponse(HttpPoster.java:280) at HttpPoster.indexPost(HttpPoster.java:191) at ParseAndLoad$PostThread.run(ParseAndLoad.java:638) Could the 400 error be due to a similar socket timeout issue? Well, that would depend on whether commons-fileupload is capable of silently eating socket exceptions, and instead truncating the post it has partially received. And, of course, on what jetty's default socket parameters look like. Can anyone save me some time and give me a pointer to where/how/what those parameters are set to, for the example? Karl -----Original Message----- From: Wright Karl (Nokia-S/Cambridge) Sent: Wednesday, June 09, 2010 11:24 AM To: [email protected] Subject: RE: Solr spewage and dropped documents, while indexing Ah, the old "misleading documentation" trick! I'll have to give this a try and see if my problem goes away. Karl -----Original Message----- From: ext Mark Miller [mailto:[email protected]] Sent: Wednesday, June 09, 2010 11:19 AM To: [email protected] Subject: Re: Solr spewage and dropped documents, while indexing Hang on though - I saw a commons jira issue from 08 that claimed the javadoc for this class was misleading and there was no default cleaner set - that issue was resolved, but the javadoc *still* seemed to indicate there was a default cleaner in use ... so I wondered if the code had changed, or the javadoc was still misleading ... Looking at getFileCleaningTracker(), it also says: An instance of FileCleaningTracker, defaults to FileCleaner.getInstance(). But then looking at the code, I don't see how that is possible. It really appears to default to null (no cleaner). So I ran a quick test, printing out the cleaning tracker, and it prints 'null'. So, perhaps we try setting one and see where your problem is? It really appears the javadoc I'm seeing does not match the code. - Mark On 6/9/10 8:01 AM, [email protected] wrote: > Ok, that theory bites the dust then... > > I'll have to work on some diagnostics then to see why the content doesn't get > added. > > Karl > > -----Original Message----- > From: ext Mark Miller [mailto:[email protected]] > Sent: Wednesday, June 09, 2010 10:39 AM > To: [email protected] > Subject: Re: Solr spewage and dropped documents, while indexing > > On 6/9/10 6:01 AM, [email protected] wrote: >> >> but if I correctly recall how DiskFileItemFactory works, it creates >> files and registers them to be cleaned up on JVM exit. If that's the >> only cleanup, that's not going to cut it for a real-world system. > > Class DiskFileItemFactory > > "Temporary files are automatically deleted as soon as they are no longer > needed. (More precisely, when the corresponding instance of File is > garbage collected.) Cleaning up those files is done by an instance of > FileCleaningTracker, and an associated thread." > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
