Re: Crash when uploading large data sets

2011-05-13 Thread Jonathan Ellis
You should upgrade to the latest Sun JVM. OpenJDK is almost a year
behind in bug fixes.

On Fri, May 13, 2011 at 11:40 AM, James Cipar  wrote:
> It is a 64 bit VM.  I didn't notice the hs_err_pid.log files since I'm 
> staring over ssh, so they're in my home directory instead of my working 
> directory.  I've attached one of those below.  I don't know much about Java, 
> so I'm not sure how to interpret this file.
>
>
>
>
>
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  Internal Error (nmethod.cpp:1707), pid=10068, tid=1085823312
> #  Error: guarantee(cont_offset != 0,"unhandled implicit exception in 
> compiled code")
> #
> # Java VM: OpenJDK 64-Bit Server VM (1.6.0_0-b11 mixed mode linux-amd64)
> # If you would like to submit a bug report, please visit:
> #   http://icedtea.classpath.org/bugzilla
> #
>
> ---  T H R E A D  ---
>
> Current thread (0x7f7e441a9c00):  JavaThread "EXPIRING-MAP-TIMER-1" 
> daemon [_thread_in_Java, id=10095, 
> stack(0x40b65000,0x40b86000)]
>
> Stack: [0x40b65000,0x40b86000],  sp=0x40b83eb0,  free 
> space=123k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> V  [libjvm.so+0x5d198f]
> V  [libjvm.so+0x5d1cf1]
> V  [libjvm.so+0x27e506]
> V  [libjvm.so+0x4970d6]
> V  [libjvm.so+0x514cd9]
> V  [libjvm.so+0x4b0b64]
>
>
> ---  P R O C E S S  ---
>
> Java Threads: ( => current thread )
>  0x02913c00 JavaThread "NonPeriodicTasks:1" [_thread_blocked, 
> id=10233, stack(0x401fc000,0x4021d000)]
>  0x026eec00 JavaThread "FILEUTILS-DELETE-POOL:1" [_thread_blocked, 
> id=10222, stack(0x42533000,0x42554000)]
>  0x7f7e3c01c000 JavaThread "Thread-22" [_thread_in_native, id=10212, 
> stack(0x42512000,0x42533000)]
>  0x0253c000 JavaThread "Thread-21" [_thread_in_native, id=10211, 
> stack(0x40dfc000,0x40e1d000)]
>  0x0253a800 JavaThread "Thread-20" [_thread_in_native, id=10210, 
> stack(0x40ddb000,0x40dfc000)]
>  0x02537400 JavaThread "Thread-19" [_thread_in_native, id=10209, 
> stack(0x424f1000,0x42512000)]
>  0x02725c00 JavaThread "pool-1-thread-1" [_thread_in_native, 
> id=10208, stack(0x4007d000,0x4009e000)]
>  0x026f7400 JavaThread "Thread-18" [_thread_in_native, id=10207, 
> stack(0x4034f000,0x4037)]
>  0x02902800 JavaThread "Thread-17" [_thread_in_native, id=10206, 
> stack(0x40d6b000,0x40d8c000)]
>  0x02901400 JavaThread "Thread-16" [_thread_in_native, id=10205, 
> stack(0x424d,0x424f1000)]
>  0x02613c00 JavaThread "Thread-15" [_thread_in_native, id=10204, 
> stack(0x4024e000,0x4026f000)]
>  0x026ad800 JavaThread "Thread-14" [_thread_in_native, id=10203, 
> stack(0x40d11000,0x40d32000)]
>  0x0276f000 JavaThread "Thread-13" [_thread_in_native, id=10202, 
> stack(0x424af000,0x424d)]
>  0x026b2c00 JavaThread "Thread-12" [_thread_in_native, id=10201, 
> stack(0x4049,0x404b1000)]
>  0x026aec00 JavaThread "Thread-11" [_thread_in_native, id=10200, 
> stack(0x4248e000,0x424af000)]
>  0x0254ec00 JavaThread "Thread-10" [_thread_in_native, id=10199, 
> stack(0x4246d000,0x4248e000)]
>  0x0254d000 JavaThread "Thread-9" [_thread_in_native, id=10198, 
> stack(0x4244c000,0x4246d000)]
>  0x02505000 JavaThread "Thread-8" [_thread_in_native, id=10197, 
> stack(0x4242b000,0x4244c000)]
>  0x02502400 JavaThread "Thread-7" [_thread_in_native, id=10196, 
> stack(0x4240a000,0x4242b000)]
>  0x02500400 JavaThread "WRITE-/172.19.149.80" [_thread_blocked, 
> id=10195, stack(0x407dd000,0x407fe000)]
>  0x024ff000 JavaThread "WRITE-/172.19.149.80" [_thread_blocked, 
> id=10194, stack(0x40c8e000,0x40caf000)]
>  0x024f4400 JavaThread "WRITE-/172.19.149.64" [_thread_blocked, 
> id=10193, stack(0x423e9000,0x4240a000)]
>  0x024f3000 JavaThread "WRITE-/172.19.149.64" [_thread_blocked, 
> id=10192, stack(0x423c8000,0x423e9000)]
>  0x024f1400 JavaThread "WRITE-/172.19.149.71" [_thread_blocked, 
> id=10191, stack(0x40a2d000,0x40a4e000)]
>  0x024f JavaThread "WRITE-/172.19.149.71" [_thread_blocked, 
> id=10190, stack(0x423a7000,0x423c8000)]
>  0x024ee400 JavaThread "WRITE-/172.19.149.62" [_thread_blocked, 
> id=10189, stack(0x42386000,0x423a7000)]
>  0x026bf800 JavaThread "WRITE-/172.19.149.62" [_thread_blocked, 
> id=10188, stack(0x42365000,0x42386000)]
>  0x026bdc00 JavaThread "WRITE-/172.19.149.72" [_thre

Re: Crash when uploading large data sets

2011-05-13 Thread James Cipar
It is a 64 bit VM.  I didn't notice the hs_err_pid.log files since I'm staring 
over ssh, so they're in my home directory instead of my working directory.  
I've attached one of those below.  I don't know much about Java, so I'm not 
sure how to interpret this file.





#
# An unexpected error has been detected by Java Runtime Environment:
#
#  Internal Error (nmethod.cpp:1707), pid=10068, tid=1085823312
#  Error: guarantee(cont_offset != 0,"unhandled implicit exception in compiled 
code")
#
# Java VM: OpenJDK 64-Bit Server VM (1.6.0_0-b11 mixed mode linux-amd64)
# If you would like to submit a bug report, please visit:
#   http://icedtea.classpath.org/bugzilla
#

---  T H R E A D  ---

Current thread (0x7f7e441a9c00):  JavaThread "EXPIRING-MAP-TIMER-1" daemon 
[_thread_in_Java, id=10095, stack(0x40b65000,0x40b86000)]

Stack: [0x40b65000,0x40b86000],  sp=0x40b83eb0,  free 
space=123k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x5d198f]
V  [libjvm.so+0x5d1cf1]
V  [libjvm.so+0x27e506]
V  [libjvm.so+0x4970d6]
V  [libjvm.so+0x514cd9]
V  [libjvm.so+0x4b0b64]


---  P R O C E S S  ---

Java Threads: ( => current thread )
  0x02913c00 JavaThread "NonPeriodicTasks:1" [_thread_blocked, 
id=10233, stack(0x401fc000,0x4021d000)]
  0x026eec00 JavaThread "FILEUTILS-DELETE-POOL:1" [_thread_blocked, 
id=10222, stack(0x42533000,0x42554000)]
  0x7f7e3c01c000 JavaThread "Thread-22" [_thread_in_native, id=10212, 
stack(0x42512000,0x42533000)]
  0x0253c000 JavaThread "Thread-21" [_thread_in_native, id=10211, 
stack(0x40dfc000,0x40e1d000)]
  0x0253a800 JavaThread "Thread-20" [_thread_in_native, id=10210, 
stack(0x40ddb000,0x40dfc000)]
  0x02537400 JavaThread "Thread-19" [_thread_in_native, id=10209, 
stack(0x424f1000,0x42512000)]
  0x02725c00 JavaThread "pool-1-thread-1" [_thread_in_native, id=10208, 
stack(0x4007d000,0x4009e000)]
  0x026f7400 JavaThread "Thread-18" [_thread_in_native, id=10207, 
stack(0x4034f000,0x4037)]
  0x02902800 JavaThread "Thread-17" [_thread_in_native, id=10206, 
stack(0x40d6b000,0x40d8c000)]
  0x02901400 JavaThread "Thread-16" [_thread_in_native, id=10205, 
stack(0x424d,0x424f1000)]
  0x02613c00 JavaThread "Thread-15" [_thread_in_native, id=10204, 
stack(0x4024e000,0x4026f000)]
  0x026ad800 JavaThread "Thread-14" [_thread_in_native, id=10203, 
stack(0x40d11000,0x40d32000)]
  0x0276f000 JavaThread "Thread-13" [_thread_in_native, id=10202, 
stack(0x424af000,0x424d)]
  0x026b2c00 JavaThread "Thread-12" [_thread_in_native, id=10201, 
stack(0x4049,0x404b1000)]
  0x026aec00 JavaThread "Thread-11" [_thread_in_native, id=10200, 
stack(0x4248e000,0x424af000)]
  0x0254ec00 JavaThread "Thread-10" [_thread_in_native, id=10199, 
stack(0x4246d000,0x4248e000)]
  0x0254d000 JavaThread "Thread-9" [_thread_in_native, id=10198, 
stack(0x4244c000,0x4246d000)]
  0x02505000 JavaThread "Thread-8" [_thread_in_native, id=10197, 
stack(0x4242b000,0x4244c000)]
  0x02502400 JavaThread "Thread-7" [_thread_in_native, id=10196, 
stack(0x4240a000,0x4242b000)]
  0x02500400 JavaThread "WRITE-/172.19.149.80" [_thread_blocked, 
id=10195, stack(0x407dd000,0x407fe000)]
  0x024ff000 JavaThread "WRITE-/172.19.149.80" [_thread_blocked, 
id=10194, stack(0x40c8e000,0x40caf000)]
  0x024f4400 JavaThread "WRITE-/172.19.149.64" [_thread_blocked, 
id=10193, stack(0x423e9000,0x4240a000)]
  0x024f3000 JavaThread "WRITE-/172.19.149.64" [_thread_blocked, 
id=10192, stack(0x423c8000,0x423e9000)]
  0x024f1400 JavaThread "WRITE-/172.19.149.71" [_thread_blocked, 
id=10191, stack(0x40a2d000,0x40a4e000)]
  0x024f JavaThread "WRITE-/172.19.149.71" [_thread_blocked, 
id=10190, stack(0x423a7000,0x423c8000)]
  0x024ee400 JavaThread "WRITE-/172.19.149.62" [_thread_blocked, 
id=10189, stack(0x42386000,0x423a7000)]
  0x026bf800 JavaThread "WRITE-/172.19.149.62" [_thread_blocked, 
id=10188, stack(0x42365000,0x42386000)]
  0x026bdc00 JavaThread "WRITE-/172.19.149.72" [_thread_blocked, 
id=10187, stack(0x40dba000,0x40ddb000)]
  0x026bc400 JavaThread "WRITE-/172.19.149.72" [_thread_blocked, 
id=10186, stack(0x42344000,0x42365000)]
  0x026bac00 JavaThread "WRITE-/172.19.149.63" [_thread_blocked, 
id=10185, stack(0x4

Re: Crash when uploading large data sets

2011-05-12 Thread Jeffrey Kesselman
If this a 64bit VM?

A 32bit Java VM with default c-heap settings can only actually use
about 2GB of Java Heap.

On Thu, May 12, 2011 at 8:08 PM, James Cipar  wrote:
> Oh, forgot this detail:  I have no swap configured, so swapping is not the 
> cause of the crash.  Could it be that I'm running out of memory on a 15GB 
> machine?  That seems unlikely.  I grepped dmesg for "oom" and didn't see 
> anything from the oom killer, and I used the instructions from the following 
> web page and didn't see that the oom killer had killed anything.
>
> http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer
>
> jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case 
> "killed process"
> jcipar@172-19-149-62:~$
>
>
>
> Also, this is pretty subjective, so I can't say for sure until it finishes, 
> but this seems to be running *much* slower after setting the heap size and 
> setting up JNA.
>
>
>
> On May 12, 2011, at 7:52 PM, James Cipar wrote:
>
>> It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my 
>> physical memory.  These are 15GB VMs, so that's 7.5GB for Cassandra.  I 
>> would have expected that to work, but I will override to 13 GB just to see 
>> what happens.
>>
>> I've also got the JNA thing set up.  Do you think this would cause the 
>> crashes, or is it just a performance improvement?
>>
>>
>>
>> On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:
>>
>>> The key JVM options for Cassandra are in cassandra.in.sh.
>>>
>>> What is your min and max heap size?
>>>
>>> The default setting of max heap size is 1GB. How much RAM do your nodes 
>>> have? You may want to increase this setting. You can also set the -Xmx and 
>>> -Xms options to the same value to keep Java from having to manage heap 
>>> growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you 
>>> can get a lot more on 64-bit.
>>>
>>> Try messing with some of the other settings in the cassandra.in.sh file.
>>>
>>> You may not have DEBUG mode turned on for Cassandra and therefore may not 
>>> be getting the full details of what's going on when the server crashes. In 
>>> the /conf/log4j-server.properties file, set this line from 
>>> the default of INFO to DEBUG:
>>>
>>> log4j.rootLogger=INFO,stdout,R
>>>
>>>
>>> Also, you haven't configured JNA on this server. Here's some info about it 
>>> and how to configure it:
>>>
>>> JNA provides Java programs easy access to native shared libraries without 
>>> writing anything but Java code.
>>>
>>> Note from Cassandra developers for why JNA is needed:
>>> "Linux aggressively swaps out infrequently used memory to make more room 
>>> for its file system buffer cache. Unfortunately, modern generational 
>>> garbage collectors like the JVM's leave parts of its heap un-touched for 
>>> relatively large amounts of time, leading Linux to swap it out. When the 
>>> JVM finally goes to use or GC that memory, swap hell ensues.
>>>
>>> Setting swappiness to zero can mitigate this behavior but does not 
>>> eliminate it entirely. Turning off swap entirely is effective. But to avoid 
>>> surprising people who don't know about this behavior, the best solution is 
>>> to tell Linux not to swap out the JVM, and that is what we do now with 
>>> mlockall via JNA.
>>>
>>> Because of licensing issues, we can't distribute JNA with Cassandra, so you 
>>> must manually add it to the Cassandra lib/ directory or otherwise place it 
>>> on the classpath. If the JNA jar is not present, Cassandra will continue as 
>>> before."
>>>
>>> Get JNA with:
>>> cd ~
>>> wget 
>>> http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb
>>>
>>> To install:
>>> techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
>>> (Reading database ... 44334 files and directories currently installed.)
>>> Preparing to replace libjna-java 3.2.4-2 (using 
>>> libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
>>> Unpacking replacement libjna-java ...
>>> Setting up libjna-java (3.2.7-0~nmu.2) ...
>>>
>>>
>>> The deb package will install the JNA jar file to /usr/share/java/jna.jar, 
>>> but Cassandra only loads it if its in the class path. The easy way to do 
>>> this is just create a symlink into your Cassandra lib directory (note: 
>>> replace /home/techlabs with your home dir location):
>>> ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib
>>>
>>> Research:
>>> http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/
>>>
>>>
>>> - Sameer
>>>
>>>
>>> On Thu, May 12, 2011 at 4:15 PM, James Cipar  wrote:
>>> I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
>>> unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
>>> breaking the data up into chunks of about 10k records.  Each record is 
>>> about 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 
>>> GB data set, everything works fine.  When I upload the 20 GB data set, 
>>> servers will oc

Re: Crash when uploading large data sets

2011-05-12 Thread Jonathan Ellis
If it's a jvm crash there should be a hs_err_pid.log file left around
in the directory you started Cassandra from.

On Thu, May 12, 2011 at 6:15 PM, James Cipar  wrote:
> I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
> unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
> breaking the data up into chunks of about 10k records.  Each record is about 
> 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
> set, everything works fine.  When I upload the 20 GB data set, servers will 
> occasionally crash.  Currently I have my client code automatically detect 
> this and restart the server, but that is less than ideal.
>
> I'm not sure what information to gather to determine what's going on here.  
> Here is a sample of a log file from when a crash occurred.  The crash was 
> immediately after the log entry tagged "2011-05-12 19:02:19,377".  Any idea 
> what's going on here?  Any other info I can gather to try to debug this?
>
>
>
>
>
>
>
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) 
> GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
> 7774142464
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) 
> GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
> 7774142464
>  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
> 50) Creating new commitlog segment 
> /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
>  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
> 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 
> 1115783 operations)
>  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) 
> Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) 
> GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 
> 7774142464
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) 
> GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 
> 7774142464
>  INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) 
> Completed flushing 
> /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 
> bytes)
>  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) 
> Discarding obsolete commit 
> log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log)
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) 
> GC for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 
> 7774142464
>  INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 
> 1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 
> operations)
>  INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) 
> Writing Memtable-Standard1@479849353(51941121 bytes, 1115783 operations)
>  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) 
> Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51
>  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) 
> Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55
>  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) 
> Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58
>  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) 
> Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67
>  INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) 
> Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61
>  INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) 
> Logging initialized
>  INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) 
> Heap size: 7634681856/7635730432
>  INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. 
> Native methods will be disabled.
>  INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) 
> Loading settings from 
> file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml
>  INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) 
> DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
>  INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening 
> /mnt/scratch/jcipar/cassandra/data/system/Schema-f-1
>  INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening 
> /mnt/scratch/jcipar/cassandra/data/system/Schema-f-2
>  INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening 
> /mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1
>  INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening 
> /mnt/s

Re: Crash when uploading large data sets

2011-05-12 Thread James Cipar
Oh, forgot this detail:  I have no swap configured, so swapping is not the 
cause of the crash.  Could it be that I'm running out of memory on a 15GB 
machine?  That seems unlikely.  I grepped dmesg for "oom" and didn't see 
anything from the oom killer, and I used the instructions from the following 
web page and didn't see that the oom killer had killed anything.

http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

jcipar@172-19-149-62:~$ sudo cat /var/log/messages | grep --ignore-case "killed 
process"
jcipar@172-19-149-62:~$ 



Also, this is pretty subjective, so I can't say for sure until it finishes, but 
this seems to be running *much* slower after setting the heap size and setting 
up JNA.



On May 12, 2011, at 7:52 PM, James Cipar wrote:

> It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my 
> physical memory.  These are 15GB VMs, so that's 7.5GB for Cassandra.  I would 
> have expected that to work, but I will override to 13 GB just to see what 
> happens.
> 
> I've also got the JNA thing set up.  Do you think this would cause the 
> crashes, or is it just a performance improvement?
> 
> 
> 
> On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:
> 
>> The key JVM options for Cassandra are in cassandra.in.sh.
>> 
>> What is your min and max heap size?
>> 
>> The default setting of max heap size is 1GB. How much RAM do your nodes 
>> have? You may want to increase this setting. You can also set the -Xmx and 
>> -Xms options to the same value to keep Java from having to manage heap 
>> growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you 
>> can get a lot more on 64-bit.
>> 
>> Try messing with some of the other settings in the cassandra.in.sh file.
>> 
>> You may not have DEBUG mode turned on for Cassandra and therefore may not be 
>> getting the full details of what's going on when the server crashes. In the 
>> /conf/log4j-server.properties file, set this line from the 
>> default of INFO to DEBUG:
>> 
>> log4j.rootLogger=INFO,stdout,R
>> 
>> 
>> Also, you haven't configured JNA on this server. Here's some info about it 
>> and how to configure it:
>> 
>> JNA provides Java programs easy access to native shared libraries without 
>> writing anything but Java code.
>> 
>> Note from Cassandra developers for why JNA is needed:
>> "Linux aggressively swaps out infrequently used memory to make more room for 
>> its file system buffer cache. Unfortunately, modern generational garbage 
>> collectors like the JVM's leave parts of its heap un-touched for relatively 
>> large amounts of time, leading Linux to swap it out. When the JVM finally 
>> goes to use or GC that memory, swap hell ensues.
>> 
>> Setting swappiness to zero can mitigate this behavior but does not eliminate 
>> it entirely. Turning off swap entirely is effective. But to avoid surprising 
>> people who don't know about this behavior, the best solution is to tell 
>> Linux not to swap out the JVM, and that is what we do now with mlockall via 
>> JNA.
>> 
>> Because of licensing issues, we can't distribute JNA with Cassandra, so you 
>> must manually add it to the Cassandra lib/ directory or otherwise place it 
>> on the classpath. If the JNA jar is not present, Cassandra will continue as 
>> before."
>> 
>> Get JNA with: 
>> cd ~
>> wget 
>> http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb
>> 
>> To install: 
>> techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
>> (Reading database ... 44334 files and directories currently installed.)
>> Preparing to replace libjna-java 3.2.4-2 (using 
>> libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
>> Unpacking replacement libjna-java ...
>> Setting up libjna-java (3.2.7-0~nmu.2) ...
>> 
>> 
>> The deb package will install the JNA jar file to /usr/share/java/jna.jar, 
>> but Cassandra only loads it if its in the class path. The easy way to do 
>> this is just create a symlink into your Cassandra lib directory (note: 
>> replace /home/techlabs with your home dir location):
>> ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib
>> 
>> Research:
>> http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/
>> 
>> 
>> - Sameer
>> 
>> 
>> On Thu, May 12, 2011 at 4:15 PM, James Cipar  wrote:
>> I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
>> unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
>> breaking the data up into chunks of about 10k records.  Each record is about 
>> 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
>> set, everything works fine.  When I upload the 20 GB data set, servers will 
>> occasionally crash.  Currently I have my client code automatically detect 
>> this and restart the server, but that is less than ideal.
>> 
>> I'm not sure what information to gather to determine what's going on here.  
>> Here is a sample of a log file from when a

Re: Crash when uploading large data sets

2011-05-12 Thread James Cipar
It looks like MAX_HEAP_SIZE is set in cassandra-env.sh to be half of my 
physical memory.  These are 15GB VMs, so that's 7.5GB for Cassandra.  I would 
have expected that to work, but I will override to 13 GB just to see what 
happens.

I've also got the JNA thing set up.  Do you think this would cause the crashes, 
or is it just a performance improvement?



On May 12, 2011, at 7:27 PM, Sameer Farooqui wrote:

> The key JVM options for Cassandra are in cassandra.in.sh.
> 
> What is your min and max heap size?
> 
> The default setting of max heap size is 1GB. How much RAM do your nodes have? 
> You may want to increase this setting. You can also set the -Xmx and -Xms 
> options to the same value to keep Java from having to manage heap growth. On 
> a 32-bit machine, you can get a max of about 1.6 GB of heap; you can get a 
> lot more on 64-bit.
> 
> Try messing with some of the other settings in the cassandra.in.sh file.
> 
> You may not have DEBUG mode turned on for Cassandra and therefore may not be 
> getting the full details of what's going on when the server crashes. In the 
> /conf/log4j-server.properties file, set this line from the 
> default of INFO to DEBUG:
> 
> log4j.rootLogger=INFO,stdout,R
> 
> 
> Also, you haven't configured JNA on this server. Here's some info about it 
> and how to configure it:
> 
> JNA provides Java programs easy access to native shared libraries without 
> writing anything but Java code.
> 
> Note from Cassandra developers for why JNA is needed:
> "Linux aggressively swaps out infrequently used memory to make more room for 
> its file system buffer cache. Unfortunately, modern generational garbage 
> collectors like the JVM's leave parts of its heap un-touched for relatively 
> large amounts of time, leading Linux to swap it out. When the JVM finally 
> goes to use or GC that memory, swap hell ensues.
> 
> Setting swappiness to zero can mitigate this behavior but does not eliminate 
> it entirely. Turning off swap entirely is effective. But to avoid surprising 
> people who don't know about this behavior, the best solution is to tell Linux 
> not to swap out the JVM, and that is what we do now with mlockall via JNA.
> 
> Because of licensing issues, we can't distribute JNA with Cassandra, so you 
> must manually add it to the Cassandra lib/ directory or otherwise place it on 
> the classpath. If the JNA jar is not present, Cassandra will continue as 
> before."
> 
> Get JNA with: 
> cd ~
> wget http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb
> 
> To install: 
> techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
> (Reading database ... 44334 files and directories currently installed.)
> Preparing to replace libjna-java 3.2.4-2 (using 
> libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
> Unpacking replacement libjna-java ...
> Setting up libjna-java (3.2.7-0~nmu.2) ...
> 
> 
> The deb package will install the JNA jar file to /usr/share/java/jna.jar, but 
> Cassandra only loads it if its in the class path. The easy way to do this is 
> just create a symlink into your Cassandra lib directory (note: replace 
> /home/techlabs with your home dir location):
> ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib
> 
> Research:
> http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/
> 
> 
> - Sameer
> 
> 
> On Thu, May 12, 2011 at 4:15 PM, James Cipar  wrote:
> I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
> unique data), to a cluster of 10 servers.  I'm using batch_mutate, and 
> breaking the data up into chunks of about 10k records.  Each record is about 
> 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data 
> set, everything works fine.  When I upload the 20 GB data set, servers will 
> occasionally crash.  Currently I have my client code automatically detect 
> this and restart the server, but that is less than ideal.
> 
> I'm not sure what information to gather to determine what's going on here.  
> Here is a sample of a log file from when a crash occurred.  The crash was 
> immediately after the log entry tagged "2011-05-12 19:02:19,377".  Any idea 
> what's going on here?  Any other info I can gather to try to debug this?
> 
> 
> 
> 
> 
> 
> 
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) 
> GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
> 7774142464
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) 
> GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
> 7774142464
>  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
> 50) Creating new commitlog segment 
> /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
>  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
> 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 
> 1115783 operations)

Re: Crash when uploading large data sets

2011-05-12 Thread Sameer Farooqui
The key JVM options for Cassandra are in cassandra.in.sh.

What is your min and max heap size?

The default setting of max heap size is 1GB. How much RAM do your nodes
have? You may want to increase this setting. You can also set the -Xmx and
-Xms options to the same value to keep Java from having to manage heap
growth. On a 32-bit machine, you can get a max of about 1.6 GB of heap; you
can get a lot more on 64-bit.

Try messing with some of the other settings in the cassandra.in.sh file.

You may not have DEBUG mode turned on for Cassandra and therefore may not be
getting the full details of what's going on when the server crashes. In the
/conf/log4j-server.properties file, set this line from the
default of INFO to DEBUG:

log4j.rootLogger=INFO,stdout,R


Also, you haven't configured JNA on this server. Here's some info about it
and how to configure it:

JNA provides Java programs easy access to native shared libraries without
writing anything but Java code.

Note from Cassandra developers for why JNA is needed:
"*Linux aggressively swaps out infrequently used memory to make more room
for its file system buffer cache. Unfortunately, modern generational garbage
collectors like the JVM's leave parts of its heap un-touched for relatively
large amounts of time, leading Linux to swap it out. When the JVM finally
goes to use or GC that memory, swap hell ensues.

Setting swappiness to zero can mitigate this behavior but does not eliminate
it entirely. Turning off swap entirely is effective. But to avoid surprising
people who don't know about this behavior, the best solution is to tell
Linux not to swap out the JVM, and that is what we do now with mlockall via
JNA.

Because of licensing issues, we can't distribute JNA with Cassandra, so you
must manually add it to the Cassandra lib/ directory or otherwise place it
on the classpath. If the JNA jar is not present, Cassandra will continue as
before.*"

Get JNA with:
*cd ~
wget
http://debian.riptano.com/debian/pool/libjna-java_3.2.7-0~nmu.2_amd64.deb*

To install:
*techlabs@cassandraN1:~$ sudo dpkg -i libjna-java_3.2.7-0~nmu.2_amd64.deb
(Reading database ... 44334 files and directories currently installed.)
Preparing to replace libjna-java 3.2.4-2 (using
libjna-java_3.2.7-0~nmu.2_amd64.deb) ...
Unpacking replacement libjna-java ...
Setting up libjna-java (3.2.7-0~nmu.2) ...*


The deb package will install the JNA jar file to /usr/share/java/jna.jar,
but Cassandra only loads it if its in the class path. The easy way to do
this is just create a symlink into your Cassandra lib directory (note:
replace /home/techlabs with your home dir location):
*ln -s /usr/share/java/jna.jar /home/techlabs/apache-cassandra-0.7.0/lib*

Research:
http://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/


- Sameer


On Thu, May 12, 2011 at 4:15 PM, James Cipar  wrote:

> I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB
> unique data), to a cluster of 10 servers.  I'm using batch_mutate, and
> breaking the data up into chunks of about 10k records.  Each record is about
> 5KB, so a total of about 50MB per batch.  When I upload a smaller 2 GB data
> set, everything works fine.  When I upload the 20 GB data set, servers will
> occasionally crash.  Currently I have my client code automatically detect
> this and restart the server, but that is less than ideal.
>
> I'm not sure what information to gather to determine what's going on here.
>  Here is a sample of a log file from when a crash occurred.  The crash was
> immediately after the log entry tagged "2011-05-12 19:02:19,377".  Any idea
> what's going on here?  Any other info I can gather to try to debug this?
>
>
>
>
>
>
>
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line
> 128) GC for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max
> is 7774142464
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line
> 128) GC for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max
> is 7774142464
>  INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java
> (line 50) Creating new commitlog segment
> /mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
>  INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java
> (line 1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529
> bytes, 1115783 operations)
>  INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158)
> Writing Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line
> 128) GC for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max
> is 7774142464
>  INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line
> 128) GC for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max
> is 7774142464
>  INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165)
> Completed flushing
> /mnt/scratch/

Crash when uploading large data sets

2011-05-12 Thread James Cipar
I'm using Cassandra 0.7.5, and uploading about 200 GB of data total (20 GB 
unique data), to a cluster of 10 servers.  I'm using batch_mutate, and breaking 
the data up into chunks of about 10k records.  Each record is about 5KB, so a 
total of about 50MB per batch.  When I upload a smaller 2 GB data set, 
everything works fine.  When I upload the 20 GB data set, servers will 
occasionally crash.  Currently I have my client code automatically detect this 
and restart the server, but that is less than ideal.

I'm not sure what information to gather to determine what's going on here.  
Here is a sample of a log file from when a crash occurred.  The crash was 
immediately after the log entry tagged "2011-05-12 19:02:19,377".  Any idea 
what's going on here?  Any other info I can gather to try to debug this?







 INFO [ScheduledTasks:1] 2011-05-12 19:02:07,855 GCInspector.java (line 128) GC 
for ParNew: 375 ms, 576641232 reclaimed leaving 5471432144 used; max is 
7774142464
 INFO [ScheduledTasks:1] 2011-05-12 19:02:08,857 GCInspector.java (line 128) GC 
for ParNew: 450 ms, -63738232 reclaimed leaving 5546942544 used; max is 
7774142464
 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:10,652 CommitLogSegment.java (line 
50) Creating new commitlog segment 
/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241330652.log
 INFO [MutationStage:24] 2011-05-12 19:02:10,680 ColumnFamilyStore.java (line 
1070) Enqueuing flush of Memtable-Standard1@1256245282(51921529 bytes, 1115783 
operations)
 INFO [FlushWriter:1] 2011-05-12 19:02:10,680 Memtable.java (line 158) Writing 
Memtable-Standard1@1256245282(51921529 bytes, 1115783 operations)
 INFO [ScheduledTasks:1] 2011-05-12 19:02:12,932 GCInspector.java (line 128) GC 
for ParNew: 249 ms, 571827736 reclaimed leaving 3165899760 used; max is 
7774142464
 INFO [ScheduledTasks:1] 2011-05-12 19:02:15,253 GCInspector.java (line 128) GC 
for ParNew: 341 ms, 561823592 reclaimed leaving 1764208800 used; max is 
7774142464
 INFO [FlushWriter:1] 2011-05-12 19:02:16,743 Memtable.java (line 165) 
Completed flushing 
/mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-74-Data.db (53646223 
bytes)
 INFO [COMMIT-LOG-WRITER] 2011-05-12 19:02:16,745 CommitLog.java (line 440) 
Discarding obsolete commit 
log:CommitLogSegment(/mnt/scratch/jcipar/cassandra/commitlog/CommitLog-1305241306438.log)
 INFO [ScheduledTasks:1] 2011-05-12 19:02:18,256 GCInspector.java (line 128) GC 
for ParNew: 305 ms, 544491840 reclaimed leaving 865198712 used; max is 
7774142464
 INFO [MutationStage:19] 2011-05-12 19:02:19,000 ColumnFamilyStore.java (line 
1070) Enqueuing flush of Memtable-Standard1@479849353(51941121 bytes, 1115783 
operations)
 INFO [FlushWriter:1] 2011-05-12 19:02:19,000 Memtable.java (line 158) Writing 
Memtable-Standard1@479849353(51941121 bytes, 1115783 operations)
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,310 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-51
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,324 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-55
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,339 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-58
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,357 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-67
 INFO [NonPeriodicTasks:1] 2011-05-12 19:02:19,377 SSTable.java (line 147) 
Deleted /mnt/scratch/jcipar/cassandra/data/Keyspace1/Standard1-f-61
 INFO [main] 2011-05-12 19:02:21,026 AbstractCassandraDaemon.java (line 78) 
Logging initialized
 INFO [main] 2011-05-12 19:02:21,040 AbstractCassandraDaemon.java (line 96) 
Heap size: 7634681856/7635730432
 INFO [main] 2011-05-12 19:02:21,042 CLibrary.java (line 61) JNA not found. 
Native methods will be disabled.
 INFO [main] 2011-05-12 19:02:21,052 DatabaseDescriptor.java (line 121) Loading 
settings from 
file:/h/jcipar/Projects/HP/OtherDBs/Cassandra/apache-cassandra-0.7.5/conf/cassandra.yaml
 INFO [main] 2011-05-12 19:02:21,178 DatabaseDescriptor.java (line 181) 
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO [main] 2011-05-12 19:02:21,310 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Schema-f-1
 INFO [main] 2011-05-12 19:02:21,327 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Schema-f-2
 INFO [main] 2011-05-12 19:02:21,336 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Migrations-f-1
 INFO [main] 2011-05-12 19:02:21,337 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/Migrations-f-2
 INFO [main] 2011-05-12 19:02:21,342 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-2
 INFO [main] 2011-05-12 19:02:21,344 SSTableReader.java (line 154) Opening 
/mnt/scratch/jcipar/cassandra/data/system/LocationInfo-f-1
 INFO