Great work, thanks Alexis! Maybe it's time to close out GORA-22 then and leave any future things that crop up as new issues.
Cheers, Chris On Oct 1, 2011, at 4:07 AM, Alexis wrote: > Last revision 1177960 should now fix the thread-safe issue: > > http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java?r1=1177960&r2=1177959&pathrev=1177960 > > Please comment on https://issues.apache.org/jira/browse/GORA-22 if > there is anything else. > > Alexis > > On Sun, Sep 4, 2011 at 10:43 AM, Alexis <[email protected]> wrote: >> Hi, >> >> I submitted the patch for peer review by just attaching it to the >> issue: https://issues.apache.org/jira/browse/GORA-22 >> >> See this article about concurreny and hashmap to read about the topic: >> http://www.ibm.com/developerworks/java/library/j-jtp07233/index.html >> >> I ended up calling toArray over the key set to get around the >> ConcurrentModificationException thrown by defaut with >> java.util.HashMap when iterating over the keys. >> >> Not that many times I encountered Cassandra crashes and Hector >> exceptions (usually because of GC triggered by Cassandra daemon?) with >> my poor 5-year-old laptop while running Nutch parse command, which is >> very CPU and IO intensive. In mapred-site.xml, see attached config, it >> worked out when you make the read batch reasonable (400 rows at a >> time) and try to separate it from the write batch (for example 843 >> written rows per batch) so that they don't happen simultaneously. >> >> >> Alexis >> >> On Tue, Aug 30, 2011 at 1:24 AM, Alexis <[email protected]> wrote: >>> Hi Tom, >>> >>> Thanks for testing Nutch 2.0 & Cassandra and reporting the obvious >>> bug. I must say there is not a very active development and testing on >>> Gora & Nutch, but at least there is some. >>> >>> >>> 1. As regards your ConcurrentModification issue, it looks like it >>> happens when flushing the store. From your exception stacktrace: >>> (Line 192 in org.apache.gora.cassandra.store.CassandraStore) >>> for (K key: this.buffer.keySet()) { >>> >>> while there are other threads adding new keys to the HashMap: >>> >>> (Line 266) >>> this.buffer.put(key, p); >>> >>> "it is not generally permissible for one thread to modify a Collection >>> while another thread is iterating over it." >>> >>> Let me try to reproduce the bug and fix it with this in mind: >>> How about introducing some mutex / lock mechanism witch >>> java.util.concurrent.locks.Lock or easier, using a thread-safe >>> implementation such as java.util.concurrent.ConcurrentHashMap? >>> >>> >>> 2. Regarding the OutOfMemory error, maybe decreasing the flushing >>> frecuency as described here? >>> http://techvineyard.blogspot.com/2011/02/gora-orm-framework-for-hadoop-jobs.html#I_O_Frequency >>> >>> I like to use the jvisualvm utility from the JDK that monitors the >>> memory usage and tells you how this evolves during the execution of >>> the class... >>> >>> Alexis >>> >>> On Mon, Aug 29, 2011 at 1:50 PM, Tom Davidson <[email protected]> wrote: >>>> Hi Lewis, >>>> >>>> I was running Nutch deployed with a dedicated Cassandra cluster. Frankly, >>>> I have given up on using Nutch 2 at this time as it seems highly unstable >>>> and not really in active development. Your effort to address this is >>>> encouraging. Because Nutch uses multithreading in the fetchers, I was >>>> getting ConcurrentModification errors and OutOfMemory errors on a regular >>>> basis in the CassandraStore. As far as I recall, the caching/flushing >>>> implementation is just not thread safe. If the CassandraStore caching was >>>> completely removed it may work, but would probably not be very efficient. >>>> If I were to fix this class, I would try to rewrite it to use Hector >>>> batched mutations instead. >>>> >>>> Tom >>>> >>>> -----Original Message----- >>>> From: lewis john mcgibbney [mailto:[email protected]] >>>> Sent: Monday, August 29, 2011 1:41 PM >>>> To: [email protected]; [email protected] >>>> Subject: Re: Gora CassandraStore is not thread safe? >>>> >>>> Hi Tom, >>>> >>>> Apologies for cross posting, this would not usually be the case but I'm >>>> hoping that if any results come from the thread then both communities can >>>> benefit. >>>> >>>> I'm in the process of getting Cassandra 0.8.4 working with Nutch 2.0 and >>>> Gora 0.2 myself and seem to be having some nasty problems. >>>> >>>> Some questions for you >>>> >>>> 1) How are you running Nutch local or deploy? >>>> 2) How are you running Cassandra, local or deployed in a cluster? >>>> >>>> The obvious thoughts are that this is a bug and that there are >>>> method(s)/object(s) which are not safe. >>>> >>>> Have you gotten any further with this? >>>> >>>> Lewis >>>> >>>> >>>> On Wed, Aug 10, 2011 at 8:43 PM, Tom Davidson <[email protected]> >>>> wrote: >>>> >>>>> Has anyone tested the CassandraStore in gora 0.2 using multiple threads? >>>>> The nutch 2 fetcher architecture has many threads writing to one >>>>> GoraRecordWriter and I am getting concurrent modification errors like >>>>> below. >>>>> >>>>> Caused by: java.util.ConcurrentModificationException >>>>> at >>>>> java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) >>>>> at java.util.HashMap$KeyIterator.next(HashMap.java:828) >>>>> at >>>>> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:192) >>>>> at >>>>> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Lewis* >>>> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
