[RELEASE] Apache Cassandra 1.0.8 released

2012-02-27 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.0.8.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is maintenance/bug fix release[1]. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Have fun!

[1]: http://goo.gl/AOfv9 (CHANGES.txt)
[2]: http://goo.gl/eFTBA (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


How to reduce the memory consumed by cassandra (so as to prevent crashes & OOMs) ?

2012-02-27 Thread Aditya Gupta
I'm running a 4 nodes cassandra cluster of VMware ubuntu instances each
768MB memory (on a single machine for development purposes). I need to
reduce heap size appropriately as  my nodes have been crashing at times
with OOMs. How do I configure for this ? I think I would need to make some
tweaks with MAX_HEAP_SIZE & HEAP_NEWSIZE in cassandra-env.sh !? but I not
sure what should be the correct values I should put here for my case.

What would have been the values for these parameters in case I had just
512mb for each node?


Using cassandra at minimal expenditures

2012-02-27 Thread Ertio Lew
Hi

I'm creating an networking site using cassandra. I am wanting to host this
application but initially with the lowest possible resources & then slowly
increasing the resources as per the service's demand & need.

*1. *I am wandering *what is the minimum recommended cluster size to start
with*?
Are there any issues if I start with as little as 2 nodes in the cluster?
In that case I guess would have replication factor of 2.
(this way I would require at min. 3 vps, 1 as web server & the 2 for
cassandra cluster, right?)

*2.* Anyone using cassandra with such minimal resources in
production environments ? Any experiences or difficulties encountered ?

*3.* In case, you would like to recommend some hosting service suitable for
me ? or if you would like to suggest some other ways to minimize the
resources (actually the hosting expenses).


Re: MemtableThroughput test in ColumnFamily.apply

2012-02-27 Thread aaron morton
That sounds odd because it is checked after each row is added to the memtable. 

What are you seeing logged when the memtable flushes ? It will say how many ops 
and how many (tracked) bytes. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/02/2012, at 7:48 AM, Thomas Richter wrote:

> Hi,
> 
> I agree, but the problem in our case is that we have rather small
> memtables (5MB) and during hinted handoff there are several dozen or
> even hundreds of MBs inserted without flushing the tables. And in that
> case it makes a difference.
> 
> Best,
> 
> Thomas
> 
> On 02/25/2012 07:06 PM, Jonathan Ellis wrote:
>> Makes sense to me, although I don't see it making a material
>> difference when there are 1000 mutations in a memtable vs 1001.
>> 
>> On Sat, Feb 25, 2012 at 11:23 AM, Thomas Richter  wrote:
>>> Hi,
>>> 
>>> while hunting down some memory consumption issues in 0.7.10 I realized
>>> that MemtableThroughput condition is tested before writing the new data.
>>> As this causes memtables to grow larger than expected I changed
>>> 
>>> Memtable apply(DecoratedKey key, ColumnFamily columnFamily)
>>>   {
>>>   long start = System.nanoTime();
>>> 
>>>   boolean flushRequested = memtable.isThresholdViolated();
>>>   memtable.put(key, columnFamily);
>>>   ColumnFamily cachedRow = getRawCachedRow(key);
>>>   if (cachedRow != null)
>>>   cachedRow.addAll(columnFamily);
>>>   writeStats.addNano(System.nanoTime() - start);
>>> 
>>>   return flushRequested ? memtable : null;
>>>   }
>>> 
>>> to
>>> 
>>> Memtable apply(DecoratedKey key, ColumnFamily columnFamily)
>>>   {
>>>   long start = System.nanoTime();
>>> 
>>> 
>>>   memtable.put(key, columnFamily);
>>>   ColumnFamily cachedRow = getRawCachedRow(key);
>>>   if (cachedRow != null)
>>>   cachedRow.addAll(columnFamily);
>>>   writeStats.addNano(System.nanoTime() - start);
>>>   boolean flushRequested = memtable.isThresholdViolated();
>>>   return flushRequested ? memtable : null;
>>>   }
>>> 
>>> Are there any objections to this change? So far it works for me.
>>> 
>>> Best,
>>> 
>>> Thomas
>> 
>> 
>> 



Re: Using cassandra at minimal expenditures

2012-02-27 Thread Dave Brosius
I guess the issue with 2 machines and RF=2 is that Consistency level of QUORUM 
is the same as ALL, so you've pretty much have little flexibility with this 
setup, of course this might be fine depending on what you want to do. In 
addition, RF=2 also means that you get no data-storage improvements from being 
distributed. Having said that, i know there are folks who run 2 machine 
clusters.dave   - Original Message -From: "Ertio Lew" 
>;ertio...@gmail.com 

Re: CounterColumn java.lang.AssertionError: Wrong class type.

2012-02-27 Thread aaron morton
To rule out the obvious problem can you check the nodes have the same schema ? 
Use cassandra-cli and run describe cluster. 

It looks like one of the nodes involved in the read has sent the wrong sort of 
column for the CF. That's not the sort of thing that normally happens.

Otherwise are you able to capture some debug logging on the nodes involved in 
the request?


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/02/2012, at 3:54 PM, Gary Ogasawara wrote:

> Using v1.0.7, we see many of the following errors.
> Any thoughts on why this is occurring?
> Thanks in advance.
> -gary
> 
> ERROR [ReadRepairStage:9] 2012-02-24 18:31:28,623
> AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[ReadRepairStage:9,5,main]
> java.lang.AssertionError: Wrong class type.
>at
> org.apache.cassandra.db.CounterColumn.diff(CounterColumn.java:112)
>at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:230)
>at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:309)
>at
> org.apache.cassandra.service.RowRepairResolver.scheduleRepairs(RowRepairReso
> lver.java:117)
>at
> org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.jav
> a:94)
>at
> org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCa
> llback.java:54)
>at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
> 10)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
> 03)
>at java.lang.Thread.run(Thread.java:722)
> ERROR [ReadRepairStage:9] 2012-02-24 18:31:28,625
> AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[ReadRepairStage:9,5,main]
> --
> 
> From cassandra-cli "show schema", I think the relevant CF is:
> 
> create column family QOSCounters
>  with column_type = 'Standard'
>  and comparator = 'UTF8Type'
>  and default_validation_class = 'CounterColumnType'
>  and key_validation_class = 'UTF8Type'
>  and rows_cached = 0.0
>  and row_cache_save_period = 0
>  and row_cache_keys_to_save = 2147483647
>  and keys_cached = 20.0
>  and key_cache_save_period = 14400
>  and read_repair_chance = 1.0
>  and gc_grace = 604800
>  and min_compaction_threshold = 4
>  and max_compaction_threshold = 32
>  and replicate_on_write = true
>  and row_cache_provider = 'SerializingCacheProvider'
>  and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';
> 
> 



Re: Frequency of Flushing in 1.0

2012-02-27 Thread aaron morton
> Isn't decomission meant to do the same thing as disablethrift and gossip?

decommission removes a node entirely from the cluster, including streaming it's 
data to other nodes. 

disablethrift and disablegossip just stop it from responding to clients and 
other nodes. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/02/2012, at 2:26 PM, Mohit Anchlia wrote:

> 
> 
> On Sun, Feb 26, 2012 at 12:18 PM, aaron morton  
> wrote:
> Nathan Milford has a post about taking a node down 
> 
> http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
> 
> The only thing I would do differently would be turn off thrift first.
> 
> Cheers
>  
> Isn't decomission meant to do the same thing as disablethrift and gossip?
> 
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/02/2012, at 4:35 AM, Edward Capriolo wrote:
> 
>> If you are doing a planned maintenance you can flush first as well
>> ensuring the that the commit logs will not be as large.
>> 
>> On Sun, Feb 26, 2012 at 10:09 AM, Radim Kolar  wrote:
 if a node goes down, it will take longer for commitlog replay.
>>> 
>>> commit log replay time is insignificant. most time during node startup is
>>> wasted on index sampling. Index sampling here runs for about 15 minutes.
> 
> 



Re: Frequency of Flushing in 1.0

2012-02-27 Thread aaron morton
yes, reducing commitlog_total_space_in_mb will reduce the amount of space 
needed by the commit logs. 

> memtable_total_space_in_mb
controls how often sstables are flushed to disk, this does not really affect 
the commit log. Other than the fact that a commit log segment cannot be deleted 
until the changes in the sstable have been flushed. But 
commitlog_total_space_in_mb is the correct way to control that. 

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/02/2012, at 4:48 PM, Xaero S wrote:

> 
> The challenge that we face is that our commitlog disk capacity is much much 
> less (under 10 GB in some cases) than the disk capacity of SSTables. So we 
> cannot really have the commitlog data continuously growing. This is the 
> reason that we need to be able to tune the the way we flush the memtables. 
> From this link - 
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
>  , it looks like "commitlog_total_space_in_mb" is the parameter to control 
> the rate at which memtables get flushed. Also it seems 
> "memtable_total_space_in_mb" is also another setting to play with.
> we are planning to do some load testing with changes to these two settings, 
> but can anyone confirm that i am headed in the right direction? Or any other 
> pointers on this?
> 
> 
> On Sun, Feb 26, 2012 at 5:26 PM, Mohit Anchlia  wrote:
> 
> 
> On Sun, Feb 26, 2012 at 12:18 PM, aaron morton  
> wrote:
> Nathan Milford has a post about taking a node down 
> 
> http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
> 
> The only thing I would do differently would be turn off thrift first.
> 
> Cheers
>  
> Isn't decomission meant to do the same thing as disablethrift and gossip?
> 
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/02/2012, at 4:35 AM, Edward Capriolo wrote:
> 
>> If you are doing a planned maintenance you can flush first as well
>> ensuring the that the commit logs will not be as large.
>> 
>> On Sun, Feb 26, 2012 at 10:09 AM, Radim Kolar  wrote:
 if a node goes down, it will take longer for commitlog replay.
>>> 
>>> commit log replay time is insignificant. most time during node startup is
>>> wasted on index sampling. Index sampling here runs for about 15 minutes.
> 
> 
> 



Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-27 Thread aaron morton
What settings do you have for cassandra.range.batch.size and rpc_timeout_in_ms  
? Have you tried reducing the first and/or increasing the second ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:

> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo  wrote:
>> Did you see the notes here?
> 
> I'm not sure what do you mean by the notes?
> 
> I'm using the mapred.* settings suggested there:
> 
> 
> mapred.max.tracker.failures
> 20
> 
> 
> mapred.map.max.attempts
> 20
> 
> 
> mapred.reduce.max.attempts
> 20
> 
> 
> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> 
> P.
> 
>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting



Re: How to reduce the memory consumed by cassandra (so as to prevent crashes & OOMs) ?

2012-02-27 Thread aaron morton
MAX_HEAP_SIZE="500M"
HEAP_NEWSIZE="100M"

That is a very small amount of memory for java, you are probably going to have 
problems. Take a look at reducing these settings in cassandra.yaml to reduce 
the amount of memory used. 

memtable_total_space_in_mb
memtable_flush_queue_size
in_memory_compaction_limit_in_mb
concurrent_compactors 


You may also find this useful https://github.com/pcmanus/ccm

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/02/2012, at 6:50 AM, Aditya Gupta wrote:

> I'm running a 4 nodes cassandra cluster of VMware ubuntu instances each 768MB 
> memory (on a single machine for development purposes). I need to reduce heap 
> size appropriately as  my nodes have been crashing at times with OOMs. How do 
> I configure for this ? I think I would need to make some tweaks with 
> MAX_HEAP_SIZE & HEAP_NEWSIZE in cassandra-env.sh !? but I not sure what 
> should be the correct values I should put here for my case.
> 
> What would have been the values for these parameters in case I had just 512mb 
> for each node? 



Re: Using cassandra at minimal expenditures

2012-02-27 Thread aaron morton
> 1. I am wandering what is the minimum recommended cluster size to start with? 

IMHO 3
http://thelastpickle.com/2011/06/13/Down-For-Me/

A

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/02/2012, at 8:17 AM, Ertio Lew wrote:

> Hi
> 
> I'm creating an networking site using cassandra. I am wanting to host this 
> application but initially with the lowest possible resources & then slowly 
> increasing the resources as per the service's demand & need. 
> 
> 1. I am wandering what is the minimum recommended cluster size to start with? 
> Are there any issues if I start with as little as 2 nodes in the cluster? In 
> that case I guess would have replication factor of 2.
> (this way I would require at min. 3 vps, 1 as web server & the 2 for 
> cassandra cluster, right?) 
> 
> 2. Anyone using cassandra with such minimal resources in production 
> environments ? Any experiences or difficulties encountered ?
> 
> 3. In case, you would like to recommend some hosting service suitable for me 
> ? or if you would like to suggest some other ways to minimize the resources 
> (actually the hosting expenses).



Cassndra 1.0.6 GC query

2012-02-27 Thread Roshan
Hi Experts

After getting an OOM error in production, I reduce the
-XX:CMSInitiatingOccupancyFraction to .45 (from .75) and
flush_largest_memtables_at to .45 (from .75). But still I am get an warning
message in production for the same Cassandra node regarding OOM. Also reduce
the concurrent compactions to one.

2012-02-27 08:01:12,913 WARN  [GCInspector] Heap is 0.45604122065696395
full.  You may need to reduce memtable and/or cache sizes.  Cassandra will
now flush up to the two largest memtables to free up memory.  Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't want
Cassandra to do this automatically
2012-02-27 08:01:12,913 WARN  [StorageService] Flushing
CFS(Keyspace='WCache', ColumnFamily='WStandard') to relieve memory pressure

Could someone please explain why still I am getting GC warnings like above.
Many thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323457.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassndra 1.0.6 GC query

2012-02-27 Thread Jonathan Ellis
Take a heap dump (there should be one from when you OOMed) and see
what is consuming your memory.

On Mon, Feb 27, 2012 at 3:45 PM, Roshan  wrote:
> Hi Experts
>
> After getting an OOM error in production, I reduce the
> -XX:CMSInitiatingOccupancyFraction to .45 (from .75) and
> flush_largest_memtables_at to .45 (from .75). But still I am get an warning
> message in production for the same Cassandra node regarding OOM. Also reduce
> the concurrent compactions to one.
>
> 2012-02-27 08:01:12,913 WARN  [GCInspector] Heap is 0.45604122065696395
> full.  You may need to reduce memtable and/or cache sizes.  Cassandra will
> now flush up to the two largest memtables to free up memory.  Adjust
> flush_largest_memtables_at threshold in cassandra.yaml if you don't want
> Cassandra to do this automatically
> 2012-02-27 08:01:12,913 WARN  [StorageService] Flushing
> CFS(Keyspace='WCache', ColumnFamily='WStandard') to relieve memory pressure
>
> Could someone please explain why still I am getting GC warnings like above.
> Many thanks.
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323457.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Server crashed due to "OutOfMemoryError: Java heap space"

2012-02-27 Thread Jonathan Ellis
What does the heap dump show is using the memory?

On Fri, Feb 24, 2012 at 3:14 PM, Feng Qu  wrote:
> Hello,
>
> We have a 6-node ring running 0.8.6 on RHEL 6.1. The first node also runs
> OpsCenter community. This node has crashed few time recently with
> "OutOfMemoryError: Java heap space" while several compactions on few 200-300
> GB SSTables were running. We are using 8GB Java heap on host with 96GB RAM.
>
> I would appreciate for help to figure out the root cause and solution.
>
> Feng Qu
>
>
>  INFO [GossipTasks:1] 2012-02-22 13:15:59,135 Gossiper.java (line 697)
> InetAddress /10.89.74.67 is now dead.
>  INFO [ScheduledTasks:1] 2012-02-22 13:16:12,114 StatusLogger.java (line 65)
> ReadStage                         0         0         0
> ERROR [CompactionExecutor:10538] 2012-02-22 13:16:12,115
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> Thread[CompactionExecutor:10538,1,
> main]
> java.lang.OutOfMemoryError: Java heap space
>         at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:123)
>         at
> org.apache.cassandra.io.sstable.SSTableScanner.(SSTableScanner.java:57)
>         at
> org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:664)
>         at
> org.apache.cassandra.db.compaction.CompactionIterator.getCollatingIterator(CompactionIterator.java:92)
>         at
> org.apache.cassandra.db.compaction.CompactionIterator.(CompactionIterator.java:68)
>         at
> org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:553)
>         at
> org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108)
>         at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>         at java.lang.Thread.run(Unknown Source)
>  INFO [GossipTasks:1] 2012-02-22 13:16:12,115 Gossiper.java (line 697)
> InetAddress /10.2.128.55 is now dead.
> ERROR [Thread-734] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[Thread-734,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>         at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
>         at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
>         at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136)
> ERROR [Thread-68450] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[Thread-68450,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>         at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
>         at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
>         at
> java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
>         at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136)
> ERROR [Thread-731] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[Thread-731,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>         at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
>         at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
>         at
> java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
>         at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136)
> ERROR [Thread-736] 2012-02-22 13:16:48,186 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[Thread-736,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>         at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecut

Re: Cassndra 1.0.6 GC query

2012-02-27 Thread Roshan
As a configuration issue, I haven't enable the heap dump directory. 

Is there another way to find the cause to this and identify possible
configuration changes?

Thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323690.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Please advise -- 750MB object possible?

2012-02-27 Thread Ben Coverston
GridFS for Cassandra here, take it FWIW. AFAIK Joaquin spent a few hours
putting this together at most.

https://github.com/joaquincasares/gratefs

-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: Cassndra 1.0.6 GC query

2012-02-27 Thread Ben Coverston
Heap dump is really the gold standard for analysis, but if you don't want
to take a heap dump for some reason:

1. Decrease the cache sizes
2. Increase the index interval size

These in combination may reduce pressure on the heap enough so you do not
see these warnings in the log.

On Mon, Feb 27, 2012 at 4:12 PM, Roshan  wrote:

> As a configuration issue, I haven't enable the heap dump directory.
>
> Is there another way to find the cause to this and identify possible
> configuration changes?
>
> Thanks.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323690.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


TimeUUID

2012-02-27 Thread Tamar Fraenkel
Hi!
I have a column family where I use rows as "time buckets".
What I do is take epoc time in seconds, and round it to 1 hour (taking the
result of time_since_epoc_second divided by 3600).
My key validation type is LongType.
I wonder whether it is better to use TimeUUID or even readable string
representation for time?
Thanks,

-- 
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Is this the correct data model thinking?

2012-02-27 Thread Blake Starkenburg
Using a user/member as an example I am curious which of the data models
would be the best fit for performance and longevity of data in Cassandra?

Consider the simple staples of user/member details like
username,email,address,state,preferences,etc. Fairly simple, storing this
data into a row key users->username[email], etc.

Now as time goes on more data such as snapshot changes like
users->username['change:123456] = 'changed email', etc. columns compound
onto the users row-key. Perhaps more preferences are added onto the row-key
or login information. I wouldn't expect the amount of columns to grow
hugely, but I've also learned to plan for the un-expected...

Simplicity would tell me to:

A.) store ALL the data associated with the user onto a single users
row-key. Some user keys may be small, others may get larger over time
depending upon activity.

but would B be a better performance model

B.) Split out user data into seperate row-keys such as
users->changes_username['change123456] = 'changed email' AND
users->preferences_username['fav_color] = 'blue'. This would add a level of
complexity and in some cases tiny row-keys along with multiple fetches for
all user/member data?

Curious what your opinions are?

Thanks!


sstable image/pic ?

2012-02-27 Thread Franc Carter
Hi,

does anyone know of a picture/image that shows the layout of
keys/columns/values in an sstable - I haven't been able to find one and am
having a hard time visualising the layout from various descriptions and
various overviews

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-27 Thread Patrik Modesto
Hi aaron,

this is our current settings:

  
  cassandra.range.batch.size
  1024
  

  
  cassandra.input.split.size
  16384
  

rpc_timeout_in_ms: 3

Regards,
P.

On Mon, Feb 27, 2012 at 21:54, aaron morton  wrote:
> What settings do you have for cassandra.range.batch.size
> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or increasing
> the second ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>
> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo 
> wrote:
>
> Did you see the notes here?
>
>
> I'm not sure what do you mean by the notes?
>
> I'm using the mapred.* settings suggested there:
>
> 
> mapred.max.tracker.failures
> 20
> 
> 
> mapred.map.max.attempts
> 20
> 
> 
> mapred.reduce.max.attempts
> 20
> 
>
> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>
> P.
>
> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>
>