[ 
https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409102#comment-13409102
 ] 

Jukka Zitting commented on TIKA-885:
------------------------------------

Hmm, that is a good point! I guess the best way to solve this, apart from 
making Metadata fully synchronized, would be to pass a copy of the given 
metadata object to the parsing process in the background thread, and then 
explicitly copy any updates back to the original Metadata instance when the 
client calls read() or other methods on the reader instance. A bit like how we 
handle the transmission of an exception across the threads.
                
> Possible ConcurrentModificationException while accessing Metadata produced by 
> ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them 
> to execute concurrently, because they notify each other only when the pipe is 
> full or empty, and do not after a char is read or written to the pipe. So i 
> modified ParsingReader to use modified versions of PipedReader and 
> PipedWriter, similar to gnu versions of them, that work concurrently. 
> However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while 
> it is being read by the ParsingReader thread, with files containing metadata 
> beyond its initial bytes. It will not occur with the current implementation, 
> because java PipedReader and PipedWriter block each other, what is a 
> performance bug that affect ParsingReader, but they could be fixed in a 
> future java release. I think it would be a defensive approach to turn access 
> to the private Metadata.metadata Map synchronized, what could avoid a 
> possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to