[ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426842#comment-13426842 ]
Jukka Zitting commented on TIKA-885: ------------------------------------ What I had in mind was something like a {{Metadata.copyFrom(Metadata)}} method that would copy all metadata from one instance to another. We'd then have three {{Metadata}} instances, one for the client, one for the parser and a shared one for passing updates from the parser to the client. Each {{write()}} in the background parser would do something like: {code} synchronized (sharedMetadata) { sharedMetadata.copyFrom(parserMetadata); } {code} ... and each {{read()}} by the client would do: {code} synchronized (sharedMetadata) { clientMetadata.copyFrom(sharedMetadata); } {code} It's not terribly elegant, but should avoid the need to make all {{Metadata}} instances thread-safe. bq. customized versions of PipedReader and PipedWriter classes that work concurrently I'm not sure I understand. Perhaps you could describe the idea in more detail either on the dev@ list or in a separate improvement issue. > Possible ConcurrentModificationException while accessing Metadata produced by > ParsingReader > ------------------------------------------------------------------------------------------- > > Key: TIKA-885 > URL: https://issues.apache.org/jira/browse/TIKA-885 > Project: Tika > Issue Type: Improvement > Components: metadata, parser > Affects Versions: 1.0 > Environment: jre 1.6_25 x64 and Windows7 Enterprise x64 > Reporter: Luis Filipe Nassif > Priority: Minor > Labels: patch > > Oracle PipedReader and PipedWriter classes have a bug that do not allow them > to execute concurrently, because they notify each other only when the pipe is > full or empty, and do not after a char is read or written to the pipe. So i > modified ParsingReader to use modified versions of PipedReader and > PipedWriter, similar to gnu versions of them, that work concurrently. > However, sometimes and with certain files, i am getting the following error: > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextEntry(Unknown Source) > at java.util.HashMap$KeyIterator.next(Unknown Source) > at java.util.AbstractCollection.toArray(Unknown Source) > at org.apache.tika.metadata.Metadata.names(Metadata.java:146) > It is because the ParsingReader.ParsingTask thread is writing metadata while > it is being read by the ParsingReader thread, with files containing metadata > beyond its initial bytes. It will not occur with the current implementation, > because java PipedReader and PipedWriter block each other, what is a > performance bug that affect ParsingReader, but they could be fixed in a > future java release. I think it would be a defensive approach to turn access > to the private Metadata.metadata Map synchronized, what could avoid a > possible future problem using ParsingReader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira