Re: Plugins and Thread Safety

2007-06-04 Thread Briggs

Well, I'll have to figure out how to do that since I have modified several
lines of code in there that have nothing to do with this fix. So, I'll have
to grab the code again and create the patch from there.

On 6/4/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:


Hi,

On 6/4/07, Briggs <[EMAIL PROTECTED]> wrote:
> So, I synchronized it and it seems that the problem has not repeated
> itself.  I think that was it.

That's great. Can you open a JIRA issue and submit a patch for this?

>
> Thanks
>
>
> On 6/1/07, Briggs <[EMAIL PROTECTED]> wrote:
> >
> > I will get back to you.  It isn't the easiest bug to test.  So, will
> > let you know soon!
> >
> > On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > > Briggs wrote:
> > > > Oh, you want me to change the getSorted method to be synchronized?
> > > > I'll put a lock in there and see what happens, if that is what you
are
> > > > referring to.
> > >
> > > Yes, please try this change.
> > >
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki <><
> > >   ___. ___ ___ ___ _ _   __
> > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > http://www.sigram.com  Contact: info at sigram dot com
> > >
> > >
> >
> >
> > --
> > "Conscious decisions by conscious minds are what make reality real"
> >
>
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>


--
Doğacan Güney





--
"Conscious decisions by conscious minds are what make reality real"


Re: Plugins and Thread Safety

2007-06-04 Thread Doğacan Güney

Hi,

On 6/4/07, Briggs <[EMAIL PROTECTED]> wrote:

So, I synchronized it and it seems that the problem has not repeated
itself.  I think that was it.


That's great. Can you open a JIRA issue and submit a patch for this?



Thanks


On 6/1/07, Briggs <[EMAIL PROTECTED]> wrote:
>
> I will get back to you.  It isn't the easiest bug to test.  So, will
> let you know soon!
>
> On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > Briggs wrote:
> > > Oh, you want me to change the getSorted method to be synchronized?
> > > I'll put a lock in there and see what happens, if that is what you are
> > > referring to.
> >
> > Yes, please try this change.
> >
> >
> > --
> > Best regards,
> > Andrzej Bialecki <><
> >   ___. ___ ___ ___ _ _   __
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>



--
"Conscious decisions by conscious minds are what make reality real"




--
Doğacan Güney


Re: Plugins and Thread Safety

2007-06-04 Thread Briggs

So, I synchronized it and it seems that the problem has not repeated
itself.  I think that was it.

Thanks


On 6/1/07, Briggs <[EMAIL PROTECTED]> wrote:


I will get back to you.  It isn't the easiest bug to test.  So, will
let you know soon!

On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Briggs wrote:
> > Oh, you want me to change the getSorted method to be synchronized?
> > I'll put a lock in there and see what happens, if that is what you are
> > referring to.
>
> Yes, please try this change.
>
>
> --
> Best regards,
> Andrzej Bialecki <><
>   ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


--
"Conscious decisions by conscious minds are what make reality real"





--
"Conscious decisions by conscious minds are what make reality real"


Re: Plugins and Thread Safety

2007-06-01 Thread Briggs

I will get back to you.  It isn't the easiest bug to test.  So, will
let you know soon!

On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

Briggs wrote:
> Oh, you want me to change the getSorted method to be synchronized?
> I'll put a lock in there and see what happens, if that is what you are
> referring to.

Yes, please try this change.


--
Best regards,
Andrzej Bialecki <><
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com





--
"Conscious decisions by conscious minds are what make reality real"


Re: Plugins and Thread Safety

2007-06-01 Thread Andrzej Bialecki

Briggs wrote:

Oh, you want me to change the getSorted method to be synchronized?
I'll put a lock in there and see what happens, if that is what you are
referring to.


Yes, please try this change.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Plugins and Thread Safety

2007-06-01 Thread Briggs

Oh, you want me to change the getSorted method to be synchronized?
I'll put a lock in there and see what happens, if that is what you are
referring to.


On 6/1/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

Briggs wrote:
> What is the design contract on plugins when it comes to thread safety?
> I was under the assumption that plugins should be thread safe, but I
> have been running into concurrent modification exceptions from the
> language identifier plugin while indexing.  My application is a bit

They should be thread-safe. E.g. Fetcher runs many threads in parallel,
each thread using plugins to handle fetching, parsing, url filtering,
etc, etc.


> different from the normal nutch way.  I have may crawls going on
> concurrently within an application.  So, that means I would also have
> many concurrent indexing tasks.  So, if I can't be guaranteed that
> plugins are threadsafe, I may need to do a nasty thing and synchronize
> my index() method (ouch).
>
>
> Here is the exception, just for info:
>
> java.util.ConcurrentModificationException
>at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
>at java.util.HashMap$ValueIterator.next(HashMap.java:817)
>at
> org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277)

This is a bug. My guess is that NGramProfile.getSorted() should be
synchronized. Could you please test if this works?

--
Best regards,
Andrzej Bialecki <><
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com





--
"Conscious decisions by conscious minds are what make reality real"


Re: Plugins and Thread Safety

2007-06-01 Thread Andrzej Bialecki

Briggs wrote:

What is the design contract on plugins when it comes to thread safety?
I was under the assumption that plugins should be thread safe, but I
have been running into concurrent modification exceptions from the
language identifier plugin while indexing.  My application is a bit


They should be thread-safe. E.g. Fetcher runs many threads in parallel, 
each thread using plugins to handle fetching, parsing, url filtering, 
etc, etc.




different from the normal nutch way.  I have may crawls going on
concurrently within an application.  So, that means I would also have
many concurrent indexing tasks.  So, if I can't be guaranteed that
plugins are threadsafe, I may need to do a nasty thing and synchronize
my index() method (ouch).


Here is the exception, just for info:

java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
   at java.util.HashMap$ValueIterator.next(HashMap.java:817)
   at 
org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277) 


This is a bug. My guess is that NGramProfile.getSorted() should be 
synchronized. Could you please test if this works?


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Plugins and Thread Safety

2007-06-01 Thread Briggs

What is the design contract on plugins when it comes to thread safety?
I was under the assumption that plugins should be thread safe, but I
have been running into concurrent modification exceptions from the
language identifier plugin while indexing.  My application is a bit
different from the normal nutch way.  I have may crawls going on
concurrently within an application.  So, that means I would also have
many concurrent indexing tasks.  So, if I can't be guaranteed that
plugins are threadsafe, I may need to do a nasty thing and synchronize
my index() method (ouch).


Here is the exception, just for info:

java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
   at java.util.HashMap$ValueIterator.next(HashMap.java:817)
   at 
org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277)
   at 
org.apache.nutch.analysis.lang.NGramProfile.analyze(NGramProfile.java:244)
   at 
org.apache.nutch.analysis.lang.LanguageIdentifier.identify(LanguageIdentifier.java:409)
   at 
org.apache.nutch.analysis.lang.LanguageIndexingFilter.filter(LanguageIndexingFilter.java:84)
   at 
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:131)
   at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:240)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
   at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)


--briggs


"Conscious decisions by conscious minds are what make reality real"