Great work, thanks Alexis! Maybe it's time to close out GORA-22 then 
and leave any future things that crop up as new issues. 

Cheers,
Chris

On Oct 1, 2011, at 4:07 AM, Alexis wrote:

> Last revision 1177960 should now fix the thread-safe issue:
> 
> http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java?r1=1177960&r2=1177959&pathrev=1177960
> 
> Please comment on https://issues.apache.org/jira/browse/GORA-22 if
> there is anything else.
> 
> Alexis
> 
> On Sun, Sep 4, 2011 at 10:43 AM, Alexis <[email protected]> wrote:
>> Hi,
>> 
>> I submitted the patch for peer review by just attaching it to the
>> issue: https://issues.apache.org/jira/browse/GORA-22
>> 
>> See this article about concurreny and hashmap to read about the topic:
>> http://www.ibm.com/developerworks/java/library/j-jtp07233/index.html
>> 
>> I ended up calling toArray over the key set to get around the
>> ConcurrentModificationException thrown by defaut with
>> java.util.HashMap when iterating over the keys.
>> 
>> Not that many times I encountered Cassandra crashes and Hector
>> exceptions (usually because of GC triggered by Cassandra daemon?) with
>> my poor 5-year-old laptop while running Nutch parse command, which is
>> very CPU and IO intensive. In mapred-site.xml, see attached config, it
>> worked out when you make the read batch reasonable (400 rows at a
>> time) and try to separate it from the write batch (for example 843
>> written rows per batch) so that they don't happen simultaneously.
>> 
>> 
>> Alexis
>> 
>> On Tue, Aug 30, 2011 at 1:24 AM, Alexis <[email protected]> wrote:
>>> Hi Tom,
>>> 
>>> Thanks for testing Nutch 2.0 & Cassandra and reporting the obvious
>>> bug. I must say there is not a very active development and testing on
>>> Gora & Nutch, but at least there is some.
>>> 
>>> 
>>> 1. As regards your ConcurrentModification issue, it looks like it
>>> happens when flushing the store. From your exception stacktrace:
>>> (Line 192 in org.apache.gora.cassandra.store.CassandraStore)
>>>    for (K key: this.buffer.keySet()) {
>>> 
>>> while there are other threads adding new keys to the HashMap:
>>> 
>>> (Line 266)
>>>    this.buffer.put(key, p);
>>> 
>>> "it is not generally permissible for one thread to modify a Collection
>>> while another thread is iterating over it."
>>> 
>>> Let me try to reproduce the bug and fix it with this in mind:
>>> How about introducing some mutex / lock mechanism witch
>>> java.util.concurrent.locks.Lock or easier, using a thread-safe
>>> implementation such as java.util.concurrent.ConcurrentHashMap?
>>> 
>>> 
>>> 2. Regarding the OutOfMemory error, maybe decreasing the flushing
>>> frecuency as described here?
>>> http://techvineyard.blogspot.com/2011/02/gora-orm-framework-for-hadoop-jobs.html#I_O_Frequency
>>> 
>>> I like to use the jvisualvm utility from the JDK that monitors the
>>> memory usage and tells you how this evolves during the execution of
>>> the class...
>>> 
>>> Alexis
>>> 
>>> On Mon, Aug 29, 2011 at 1:50 PM, Tom Davidson <[email protected]> wrote:
>>>> Hi Lewis,
>>>> 
>>>> I was running Nutch deployed with a dedicated Cassandra cluster. Frankly, 
>>>> I have given up on using Nutch 2 at this time as it seems highly unstable 
>>>> and not really in active development. Your effort to address this is 
>>>> encouraging. Because Nutch uses multithreading in the fetchers, I was 
>>>> getting ConcurrentModification errors and OutOfMemory errors on a regular 
>>>> basis in the CassandraStore. As far as I recall, the caching/flushing 
>>>> implementation is just not thread safe. If the CassandraStore caching was 
>>>> completely removed it may work, but would probably not be very efficient.  
>>>> If I were to fix this class, I would try to rewrite it to use Hector 
>>>> batched mutations instead.
>>>> 
>>>> Tom
>>>> 
>>>> -----Original Message-----
>>>> From: lewis john mcgibbney [mailto:[email protected]]
>>>> Sent: Monday, August 29, 2011 1:41 PM
>>>> To: [email protected]; [email protected]
>>>> Subject: Re: Gora CassandraStore is not thread safe?
>>>> 
>>>> Hi Tom,
>>>> 
>>>> Apologies for cross posting, this would not usually be the case but I'm
>>>> hoping that if any results come from the thread then both communities can
>>>> benefit.
>>>> 
>>>> I'm in the process of getting Cassandra 0.8.4 working with Nutch 2.0 and
>>>> Gora 0.2 myself and seem to be having some nasty problems.
>>>> 
>>>> Some questions for you
>>>> 
>>>> 1) How are you running Nutch local or deploy?
>>>> 2) How are you running Cassandra, local or deployed in a cluster?
>>>> 
>>>> The obvious thoughts are that this is a bug and that there are
>>>> method(s)/object(s) which are not safe.
>>>> 
>>>> Have you gotten any further with this?
>>>> 
>>>> Lewis
>>>> 
>>>> 
>>>> On Wed, Aug 10, 2011 at 8:43 PM, Tom Davidson <[email protected]> 
>>>> wrote:
>>>> 
>>>>> Has anyone tested the CassandraStore in gora 0.2 using multiple threads?
>>>>>  The nutch 2 fetcher architecture has many threads writing to one
>>>>> GoraRecordWriter and I am getting concurrent modification errors like 
>>>>> below.
>>>>> 
>>>>> Caused by: java.util.ConcurrentModificationException
>>>>>               at 
>>>>> java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>>>>>               at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>>>>>               at
>>>>> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:192)
>>>>>               at
>>>>> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Lewis*
>>>> 
>>> 
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to