I think you're supposed to read to the point where it says "queues stuff in memory before sending to the server" and extrapolate that writing to the queue too fast is a bad thing.
On Sun, 17 Oct 2010, Shi Yu wrote: > Kelvin. > > This is year 2010 and computer programs should not be that fragile. > And I believe my code is just a fast simple toy problem trying to find > out why I failed too many times in my real problem. Before I post my > problem, I checked and searched many documents, I read through the API > and there is no clear instruction telling me what should I do to > prevent such an error. I don't have time to bug an API on purpose, I > am doing NLP pos tagging and I have exactly 6 million stemmed word to > store. Fortunately or unlucky to me, that number exactly triggers the > failure so I had to spend 6 hours finding out the reason. Actually spy > client is the first API I tried, as I pointed out in my first post, it > is fast, however, there is an error. I don't think for a normal > end-product API, the memory leak issue should be considered by the > user. > > Shi > > On Sun, Oct 17, 2010 at 1:11 AM, Kelvin Edmison <kel...@kindsight.net> wrote: > > Shi, > > > > Be careful when you start calling it a buggy API, especially as you > > present the quality of code that you did in your initial test case. Your > > bugs-per-LOC was pretty high. > > > > However, it seems that you did in fact stumble into a bug in the Spy client, > > but only because you did no error checking at all. > > > > Dustin, > > while trying to re-create this problem and point out the various errors in > > his code, I found that, in his test case, if I did not call Future.get() to > > verify the result of the set, the spyMemcached client leaked memory. Given > > that the Spymemcached wiki says that fire-and-forget is a valid mode of > > usage, this appears to be a bug. > > > > Here's my testcase against spymemcached-2.5.jar: > > 'java -cp .:./memcached-2.5.jar FutureResultLeak true' leaks memory and will > > eventually die OOM. > > ' java -cp .:./memcached-2.5.jar FutureResultLeak false' does not leak and > > runs to completion. > > > > Here's the code. It's based on Shi's testcase so he and I now share the > > blame for code quality :) > > > > ---------------------- > > import net.spy.memcached.*; > > import java.lang.*; > > import java.net.*; > > import java.util.concurrent.*; > > > > public class FutureResultLeak { > > > > public static void main(String[] args) throws Exception { > > boolean leakMemory = false; > > if (args.length >= 1) { > > leakMemory = Boolean.valueOf(args[0]); > > } > > System.out.println("Testcase will " + (leakMemory ? "leak memory" : "not > > leak memory")); > > MemcachedClient mc=new MemcachedClient(new > > InetSocketAddress("localhost", 11211)); > > mc.flush(); > > System.out.println("Memcached flushed ..."); > > int count = 0; > > int logInterval = 100000; > > int itemExpiryTime = 600; > > long intervalStartTime = System.currentTimeMillis(); > > for(int i=0;i<6000000;i++){ > > String a = "String"+i; > > String b = "Value"+i; > > > > > > Future<Boolean> f =mc.add(a,itemExpiryTime, b); > > if (!leakMemory) { > > f.get(); > > } > > count++; > > if (count % logInterval == 0) { > > long elapsed = System.currentTimeMillis() - intervalStartTime; > > double itemsPerSec = logInterval*1.0/elapsed; > > System.out.println(count+ " elements added in " + elapsed + " (" + > > itemsPerSec + " per sec)."); > > intervalStartTime = System.currentTimeMillis(); > > } > > } > > > > System.out.println("done "+ count +" records inserted"); > > mc.shutdown(60, TimeUnit.SECONDS); > > } > > } > > ---------------------- > > > > > > Regards, > > Kelvin > > > > > > > > > > On 17/10/10 12:28 AM, "Shi Yu" <shee...@gmail.com> wrote: > > > >> And I run with the following java command on a 64-bit Unix machine > >> which has 8G memory. I separate the Map into three parts, still > >> failed. TBH I think there is some bug in the spymemcached input > >> method. With Whalin's API there is no any problem with only 2G heap > >> size, just a little bit slower but thats definitely better than being > >> stuck for 6 hours on a bugged API. > >> > >> java -Xms4G -Xmx4G -classpath ./lib/spymemcached-2.5.jar Memcaceload > >> > >> Here is the error output: > >> > >> 2010-10-16 22:40:50.959 INFO net.spy.memcached.MemcachedConnection: > >> Added {QA sa=ocuic32.research/192.168.136.36:11211, #Rops=0, #Wops=0, > >> #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect > >> queue > >> Memchaced flushed ... > >> Cache loader created ... > >> 2010-10-16 22:40:50.989 INFO net.spy.memcached.MemcachedConnection: > >> Connection state changed for sun.nio.ch.selectionkeyi...@25fa1bb6 > >> map1 loaded > >> map2 loaded > >> java.lang.OutOfMemoryError: Java heap space > >> at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:51) > >> at > >> java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:215) > >> at > >> java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:207) > >> at java.lang.StringCoding.encode(StringCoding.java:266) > >> at java.lang.String.getBytes(String.java:947) > >> at net.spy.memcached.KeyUtil.getKeyBytes(KeyUtil.java:20) > >> at > >> net.spy.memcached.protocol.ascii.OperationImpl.setArguments(OperationImpl.java > >> :86) > >> at > >> net.spy.memcached.protocol.ascii.BaseStoreOperationImpl.initialize(BaseStoreOp > >> erationImpl.java:48) > >> at > >> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:60 > >> 1) > >> at > >> net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:58 > >> 2) > >> at > >> net.spy.memcached.MemcachedClient.addOp(MemcachedClient.java:277) > >> at > >> net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:314) > >> at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:691) > >> at net.spy.memcached.util.CacheLoader.push(CacheLoader.java:92) > >> at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:61) > >> at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:75) > >> at MemchacedLoad.mapload(MemchacedLoad.java:90) > >> at MemchacedLoad.main(MemchacedLoad.java:159) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> at > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j > >> ava:25) > >> at java.lang.reflect.Method.invoke(Method.java:597) > >> at org.apache.hadoop.util.RunJar.main(RunJar.java:165) > >> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > >> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > >> > >> Shi > >> > >> On Sat, Oct 16, 2010 at 10:23 PM, Dustin <dsalli...@gmail.com> wrote: > >>> > >>> On Oct 16, 6:45 pm, Shi Yu <shee...@gmail.com> wrote: > >>>> I have also tried the CacheLoader API, it pops a java GC error. The > >>>> thing I haven't tried is to separate 6 million records into several > >>>> objects and try CacheLoader. But I don't think it should be that > >>>> fragile and complicated. I have spent a whole day on this issue, now I > >>>> just rely the hybrid approach to finish the work. But I would be very > >>>> interested to hear any solution to solve this issue. > >>> > >>> I cannot make any suggestions as to why you got an error without > >>> knowing what you did and what error you got. > >>> > >>> I would not expect the same that you posted to work without a lot of > >>> memory, tweaking, and a very fast network since you're just filling an > >>> output queue as fast as java will allow you. > >> > >>> You didn't share any code using CacheLoader, so I can only guess as > >>> to how you may have used it to get an error. There are three > >>> different methods you can use -- did you try to create a map with six > >>> million values and then pass it to the CacheLoader API (that would > >>> very likely give you an out of memory error). > >> > >> > >>> > >>> You could also be taxing the GC considerably by converting integers > >>> to strings to compute modulus if your jvm doesn't do proper escape > >>> analysis. > >>> > >>> I can assure you there's no magic that will make it fail to load six > >>> million records through the API as long as you account for the > >>> realities of your network (which CacheLoader does for you) and your > >>> available memory. > > > > >