[
https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475865#comment-13475865
]
Luis Filipe Nassif edited comment on TIKA-885 at 10/16/12 12:46 PM:
--------------------------------------------------------------------
Ok, I got the idea. I think it will solve the problem. Setting a flag on
metadata changes and testing for it on reads and writes would save
unnecessarily copies and synchronization.
I will open a new issue describing the improvement on PipedReader and
PipedWriter to use with ParsingReader.
was (Author: lfcnassif):
Ok, I got the idea. I think it will solve the problem. Setting a flag on
metadata changes and testing for it on reads and writes would save
unnecessarily copies and synchronization.
I have opened a new issue TIKA-1007 describing the improvement on PipedReader
and PipedWriter to use with ParsingReader.
> Possible ConcurrentModificationException while accessing Metadata produced by
> ParsingReader
> -------------------------------------------------------------------------------------------
>
> Key: TIKA-885
> URL: https://issues.apache.org/jira/browse/TIKA-885
> Project: Tika
> Issue Type: Improvement
> Components: metadata, parser
> Affects Versions: 1.0
> Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
> Reporter: Luis Filipe Nassif
> Priority: Minor
> Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them
> to execute concurrently, because they notify each other only when the pipe is
> full or empty, and do not after a char is read or written to the pipe. So i
> modified ParsingReader to use modified versions of PipedReader and
> PipedWriter, similar to gnu versions of them, that work concurrently.
> However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
> at java.util.HashMap$KeyIterator.next(Unknown Source)
> at java.util.AbstractCollection.toArray(Unknown Source)
> at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while
> it is being read by the ParsingReader thread, with files containing metadata
> beyond its initial bytes. It will not occur with the current implementation,
> because java PipedReader and PipedWriter block each other, what is a
> performance bug that affect ParsingReader, but they could be fixed in a
> future java release. I think it would be a defensive approach to turn access
> to the private Metadata.metadata Map synchronized, what could avoid a
> possible future problem using ParsingReader.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira