I think you're supposed to read to the point where it says "queues stuff
in memory before sending to the server" and extrapolate that writing to
the queue too fast is a bad thing.

On Sun, 17 Oct 2010, Shi Yu wrote:

> Kelvin.
>
> This is year 2010 and computer programs should not be that fragile.
> And I believe my code is just a fast simple toy problem trying to find
> out why I failed too many times in my real problem. Before I post my
> problem, I checked and searched many documents, I read through the API
> and there is no clear instruction telling me what should I do to
> prevent such an error. I don't have time to bug an API on purpose, I
> am doing NLP pos tagging and I have exactly 6 million stemmed word to
> store. Fortunately or unlucky to me, that number exactly triggers the
> failure so I had to spend 6 hours finding out the reason. Actually spy
> client is the first API I tried, as I pointed out in my first post, it
> is fast, however, there is an error. I don't think for a normal
> end-product API, the memory leak issue should be considered by the
> user.
>
> Shi
>
> On Sun, Oct 17, 2010 at 1:11 AM, Kelvin Edmison <kel...@kindsight.net> wrote:
> > Shi,
> >
> >  Be careful when you start calling it a buggy API, especially as you
> > present the quality of code that you did in your initial test case.  Your
> > bugs-per-LOC was pretty high.
> >
> > However, it seems that you did in fact stumble into a bug in the Spy client,
> > but only because you did no error checking at all.
> >
> > Dustin,
> >  while trying to re-create this problem and point out the various errors in
> > his code, I found that, in his test case, if I did not call Future.get() to
> > verify the result of the set, the spyMemcached client leaked memory.  Given
> > that the Spymemcached wiki says that fire-and-forget is a valid mode of
> > usage, this appears to be a bug.
> >
> > Here's my testcase against spymemcached-2.5.jar:
> > 'java -cp .:./memcached-2.5.jar FutureResultLeak true' leaks memory and will
> > eventually die OOM.
> > ' java -cp .:./memcached-2.5.jar FutureResultLeak false' does not leak and
> > runs to completion.
> >
> > Here's the code. It's based on Shi's testcase so he and I now share the
> > blame for code quality :)
> >
> > ----------------------
> > import net.spy.memcached.*;
> > import java.lang.*;
> > import java.net.*;
> > import java.util.concurrent.*;
> >
> > public class FutureResultLeak {
> >
> >  public static void main(String[] args) throws Exception {
> >    boolean leakMemory = false;
> >    if (args.length >= 1) {
> >      leakMemory = Boolean.valueOf(args[0]);
> >    }
> >    System.out.println("Testcase will " + (leakMemory ? "leak memory" : "not
> > leak memory"));
> >    MemcachedClient mc=new MemcachedClient(new
> > InetSocketAddress("localhost", 11211));
> >    mc.flush();
> >    System.out.println("Memcached flushed ...");
> >    int count = 0;
> >    int logInterval = 100000;
> >    int itemExpiryTime = 600;
> >    long intervalStartTime = System.currentTimeMillis();
> >    for(int i=0;i<6000000;i++){
> >      String a = "String"+i;
> >      String b = "Value"+i;
> >
> >
> >      Future<Boolean> f =mc.add(a,itemExpiryTime, b);
> >      if (!leakMemory) {
> >        f.get();
> >      }
> >      count++;
> >      if (count % logInterval == 0) {
> >        long elapsed = System.currentTimeMillis() - intervalStartTime;
> >        double itemsPerSec = logInterval*1.0/elapsed;
> >        System.out.println(count+ " elements added in " + elapsed + " (" +
> > itemsPerSec + " per sec).");
> >        intervalStartTime = System.currentTimeMillis();
> >      }
> >    }
> >
> >    System.out.println("done "+ count +" records inserted");
> >    mc.shutdown(60, TimeUnit.SECONDS);
> >  }
> > }
> > ----------------------
> >
> >
> > Regards,
> >  Kelvin
> >
> >
> >
> >
> > On 17/10/10 12:28 AM, "Shi Yu" <shee...@gmail.com> wrote:
> >
> >> And I run with the following java command on a 64-bit Unix machine
> >> which has 8G memory. I separate the Map into three parts, still
> >> failed. TBH I think there is some bug in the spymemcached input
> >> method. With Whalin's API there is no any problem with only 2G heap
> >> size, just a little bit slower but thats definitely better than being
> >> stuck for 6 hours on a bugged API.
> >>
> >> java -Xms4G -Xmx4G -classpath ./lib/spymemcached-2.5.jar Memcaceload
> >>
> >> Here is the error output:
> >>
> >> 2010-10-16 22:40:50.959 INFO net.spy.memcached.MemcachedConnection:
> >> Added {QA sa=ocuic32.research/192.168.136.36:11211, #Rops=0, #Wops=0,
> >> #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect
> >> queue
> >> Memchaced flushed ...
> >> Cache loader created ...
> >> 2010-10-16 22:40:50.989 INFO net.spy.memcached.MemcachedConnection:
> >> Connection state changed for sun.nio.ch.selectionkeyi...@25fa1bb6
> >> map1 loaded
> >> map2 loaded
> >> java.lang.OutOfMemoryError: Java heap space
> >>         at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:51)
> >>         at 
> >> java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:215)
> >>         at 
> >> java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:207)
> >>         at java.lang.StringCoding.encode(StringCoding.java:266)
> >>         at java.lang.String.getBytes(String.java:947)
> >>         at net.spy.memcached.KeyUtil.getKeyBytes(KeyUtil.java:20)
> >>         at
> >> net.spy.memcached.protocol.ascii.OperationImpl.setArguments(OperationImpl.java
> >> :86)
> >>         at
> >> net.spy.memcached.protocol.ascii.BaseStoreOperationImpl.initialize(BaseStoreOp
> >> erationImpl.java:48)
> >>         at
> >> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:60
> >> 1)
> >>         at
> >> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:58
> >> 2)
> >>         at 
> >> net.spy.memcached.MemcachedClient.addOp(MemcachedClient.java:277)
> >>         at
> >> net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:314)
> >>         at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:691)
> >>         at net.spy.memcached.util.CacheLoader.push(CacheLoader.java:92)
> >>         at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:61)
> >>         at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:75)
> >>         at MemchacedLoad.mapload(MemchacedLoad.java:90)
> >>         at MemchacedLoad.main(MemchacedLoad.java:159)
> >>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>         at
> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>         at
> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> >> ava:25)
> >>         at java.lang.reflect.Method.invoke(Method.java:597)
> >>         at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> >>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> >>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> >>
> >> Shi
> >>
> >> On Sat, Oct 16, 2010 at 10:23 PM, Dustin <dsalli...@gmail.com> wrote:
> >>>
> >>> On Oct 16, 6:45 pm, Shi Yu <shee...@gmail.com> wrote:
> >>>> I have also tried the CacheLoader API, it pops a java GC error. The
> >>>> thing I haven't tried is to separate 6 million records into several
> >>>> objects and try CacheLoader. But I don't think it should be that
> >>>> fragile and complicated. I have spent a whole day on this issue, now I
> >>>> just rely the hybrid approach to finish the work. But I would be very
> >>>> interested to hear any solution to solve this issue.
> >>>
> >>>  I cannot make any suggestions as to why you got an error without
> >>> knowing what you did and what error you got.
> >>>
> >>>  I would not expect the same that you posted to work without a lot of
> >>> memory, tweaking, and a very fast network since you're just filling an
> >>> output queue as fast as java will allow you.
> >>
> >>>  You didn't share any code using CacheLoader, so I can only guess as
> >>> to how you may have used it to get an error.  There are three
> >>> different methods you can use -- did you try to create a map with six
> >>> million values and then pass it to the CacheLoader API (that would
> >>> very likely give you an out of memory error).
> >>
> >>
> >>>
> >>>  You could also be taxing the GC considerably by converting integers
> >>> to strings to compute modulus if your jvm doesn't do proper escape
> >>> analysis.
> >>>
> >>>  I can assure you there's no magic that will make it fail to load six
> >>> million records through the API as long as you account for the
> >>> realities of your network (which CacheLoader does for you) and your
> >>> available memory.
> >
> >
>

Reply via email to