Re: Plugins and Thread Safety
Well, I'll have to figure out how to do that since I have modified several lines of code in there that have nothing to do with this fix. So, I'll have to grab the code again and create the patch from there. On 6/4/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: Hi, On 6/4/07, Briggs <[EMAIL PROTECTED]> wrote: > So, I synchronized it and it seems that the problem has not repeated > itself. I think that was it. That's great. Can you open a JIRA issue and submit a patch for this? > > Thanks > > > On 6/1/07, Briggs <[EMAIL PROTECTED]> wrote: > > > > I will get back to you. It isn't the easiest bug to test. So, will > > let you know soon! > > > > On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > > Briggs wrote: > > > > Oh, you want me to change the getSorted method to be synchronized? > > > > I'll put a lock in there and see what happens, if that is what you are > > > > referring to. > > > > > > Yes, please try this change. > > > > > > > > > -- > > > Best regards, > > > Andrzej Bialecki <>< > > > ___. ___ ___ ___ _ _ __ > > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > > ___|||__|| \| || | Embedded Unix, System Integration > > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > > > > > > -- > > "Conscious decisions by conscious minds are what make reality real" > > > > > > -- > "Conscious decisions by conscious minds are what make reality real" > -- Doğacan Güney -- "Conscious decisions by conscious minds are what make reality real"
Re: Plugins and Thread Safety
Hi, On 6/4/07, Briggs <[EMAIL PROTECTED]> wrote: So, I synchronized it and it seems that the problem has not repeated itself. I think that was it. That's great. Can you open a JIRA issue and submit a patch for this? Thanks On 6/1/07, Briggs <[EMAIL PROTECTED]> wrote: > > I will get back to you. It isn't the easiest bug to test. So, will > let you know soon! > > On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > Briggs wrote: > > > Oh, you want me to change the getSorted method to be synchronized? > > > I'll put a lock in there and see what happens, if that is what you are > > > referring to. > > > > Yes, please try this change. > > > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > -- > "Conscious decisions by conscious minds are what make reality real" > -- "Conscious decisions by conscious minds are what make reality real" -- Doğacan Güney
Re: Plugins and Thread Safety
So, I synchronized it and it seems that the problem has not repeated itself. I think that was it. Thanks On 6/1/07, Briggs <[EMAIL PROTECTED]> wrote: I will get back to you. It isn't the easiest bug to test. So, will let you know soon! On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Briggs wrote: > > Oh, you want me to change the getSorted method to be synchronized? > > I'll put a lock in there and see what happens, if that is what you are > > referring to. > > Yes, please try this change. > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- "Conscious decisions by conscious minds are what make reality real" -- "Conscious decisions by conscious minds are what make reality real"
Re: Plugins and Thread Safety
I will get back to you. It isn't the easiest bug to test. So, will let you know soon! On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: Briggs wrote: > Oh, you want me to change the getSorted method to be synchronized? > I'll put a lock in there and see what happens, if that is what you are > referring to. Yes, please try this change. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- "Conscious decisions by conscious minds are what make reality real"
Re: Plugins and Thread Safety
Briggs wrote: Oh, you want me to change the getSorted method to be synchronized? I'll put a lock in there and see what happens, if that is what you are referring to. Yes, please try this change. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Plugins and Thread Safety
Oh, you want me to change the getSorted method to be synchronized? I'll put a lock in there and see what happens, if that is what you are referring to. On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: Briggs wrote: > What is the design contract on plugins when it comes to thread safety? > I was under the assumption that plugins should be thread safe, but I > have been running into concurrent modification exceptions from the > language identifier plugin while indexing. My application is a bit They should be thread-safe. E.g. Fetcher runs many threads in parallel, each thread using plugins to handle fetching, parsing, url filtering, etc, etc. > different from the normal nutch way. I have may crawls going on > concurrently within an application. So, that means I would also have > many concurrent indexing tasks. So, if I can't be guaranteed that > plugins are threadsafe, I may need to do a nasty thing and synchronize > my index() method (ouch). > > > Here is the exception, just for info: > > java.util.ConcurrentModificationException >at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787) >at java.util.HashMap$ValueIterator.next(HashMap.java:817) >at > org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277) This is a bug. My guess is that NGramProfile.getSorted() should be synchronized. Could you please test if this works? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- "Conscious decisions by conscious minds are what make reality real"
Re: Plugins and Thread Safety
Briggs wrote: What is the design contract on plugins when it comes to thread safety? I was under the assumption that plugins should be thread safe, but I have been running into concurrent modification exceptions from the language identifier plugin while indexing. My application is a bit They should be thread-safe. E.g. Fetcher runs many threads in parallel, each thread using plugins to handle fetching, parsing, url filtering, etc, etc. different from the normal nutch way. I have may crawls going on concurrently within an application. So, that means I would also have many concurrent indexing tasks. So, if I can't be guaranteed that plugins are threadsafe, I may need to do a nasty thing and synchronize my index() method (ouch). Here is the exception, just for info: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787) at java.util.HashMap$ValueIterator.next(HashMap.java:817) at org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277) This is a bug. My guess is that NGramProfile.getSorted() should be synchronized. Could you please test if this works? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Plugins and Thread Safety
What is the design contract on plugins when it comes to thread safety? I was under the assumption that plugins should be thread safe, but I have been running into concurrent modification exceptions from the language identifier plugin while indexing. My application is a bit different from the normal nutch way. I have may crawls going on concurrently within an application. So, that means I would also have many concurrent indexing tasks. So, if I can't be guaranteed that plugins are threadsafe, I may need to do a nasty thing and synchronize my index() method (ouch). Here is the exception, just for info: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787) at java.util.HashMap$ValueIterator.next(HashMap.java:817) at org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277) at org.apache.nutch.analysis.lang.NGramProfile.analyze(NGramProfile.java:244) at org.apache.nutch.analysis.lang.LanguageIdentifier.identify(LanguageIdentifier.java:409) at org.apache.nutch.analysis.lang.LanguageIndexingFilter.filter(LanguageIndexingFilter.java:84) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:131) at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:240) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155) --briggs "Conscious decisions by conscious minds are what make reality real"