[ 
https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426842#comment-13426842
 ] 

Jukka Zitting commented on TIKA-885:
------------------------------------

What I had in mind was something like a {{Metadata.copyFrom(Metadata)}} method 
that would copy all metadata from one instance to another. We'd then have three 
{{Metadata}} instances, one for the client, one for the parser and a shared one 
for passing updates from the parser to the client. Each {{write()}} in the 
background parser would do something like:

{code}
synchronized (sharedMetadata) {
    sharedMetadata.copyFrom(parserMetadata);
}
{code}

... and each {{read()}} by the client would do:

{code}
synchronized (sharedMetadata) {
    clientMetadata.copyFrom(sharedMetadata);
}
{code}

It's not terribly elegant, but should avoid the need to make all {{Metadata}} 
instances thread-safe.

bq. customized versions of PipedReader and PipedWriter classes that work 
concurrently

I'm not sure I understand. Perhaps you could describe the idea in more detail 
either on the dev@ list or in a separate improvement issue.
                
> Possible ConcurrentModificationException while accessing Metadata produced by 
> ParsingReader
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-885
>                 URL: https://issues.apache.org/jira/browse/TIKA-885
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 1.0
>         Environment: jre 1.6_25 x64 and Windows7 Enterprise x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>              Labels: patch
>
> Oracle PipedReader and PipedWriter classes have a bug that do not allow them 
> to execute concurrently, because they notify each other only when the pipe is 
> full or empty, and do not after a char is read or written to the pipe. So i 
> modified ParsingReader to use modified versions of PipedReader and 
> PipedWriter, similar to gnu versions of them, that work concurrently. 
> However, sometimes and with certain files, i am getting the following error:
> java.util.ConcurrentModificationException
>                 at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>                 at java.util.HashMap$KeyIterator.next(Unknown Source)
>                 at java.util.AbstractCollection.toArray(Unknown Source)
>                 at org.apache.tika.metadata.Metadata.names(Metadata.java:146)
> It is because the ParsingReader.ParsingTask thread is writing metadata while 
> it is being read by the ParsingReader thread, with files containing metadata 
> beyond its initial bytes. It will not occur with the current implementation, 
> because java PipedReader and PipedWriter block each other, what is a 
> performance bug that affect ParsingReader, but they could be fixed in a 
> future java release. I think it would be a defensive approach to turn access 
> to the private Metadata.metadata Map synchronized, what could avoid a 
> possible future problem using ParsingReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to