Re: Running sstableloader from every node when migrating?

2015-11-30 Thread Robert Coli
On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos 
wrote:

> We would like to migrate one keyspace from a 6-node cluster to a 3-node
> one.
>

http://www.pythian.com/blog/bulk-loading-options-for-cassandra/

=Rob


Re: Strategy tools for taking snapshots to load in another cluster instance

2015-11-30 Thread Robert Coli
On Wed, Nov 18, 2015 at 2:23 AM, Anishek Agarwal  wrote:

> We have 5 node prod cluster and 3 node test cluster. Is there a way i can
> take snapshot of a table in prod and load it test cluster. The cassandra
> versions are same.
>

http://www.pythian.com/blog/bulk-loading-options-for-cassandra/

=Rob


Re: Does the rebuild tools rebuild all each time it start Or rebuild the rest?

2015-11-30 Thread Robert Coli
On Fri, Nov 20, 2015 at 8:54 AM, wateray  wrote:

> *is it rebuild all rang of tokens which belong to the node or just rebuild
> the rest rang of tokens from last rebuild.(since last rebuild we get some
> data).*
>

There is no resume in versions before 2.2.x, it will duplicate-rebuild
anything that has already been rebuilt.

https://issues.apache.org/jira/browse/CASSANDRA-8494
and especially
https://issues.apache.org/jira/browse/CASSANDRA-8838
which should apply to "rebuild" as well as "bootstrap" ...

=Rob


Re: Cassandra 3.0.0 connection problem

2015-11-30 Thread Robert Coli
On Wed, Nov 18, 2015 at 11:13 PM, Enrico Sola 
wrote:

> Hi, I'm new to Cassandra and I've recently upgraded to 3.0.0 on Ubuntu
> Linux 14.04 LTS
>

https://www.eventbrite.com/engineering/what-version-of-cassandra-should-i-run/

=Rob


Questions to StorageServiceMBean.forceRepaireRangeAsync()

2015-11-30 Thread Lu, Boying
Hi, All,

We plan to upgrade Cassandra from 2.0.17 to the latest release 2.2.3 in our 
product.

We use:
/**
 * Same as forceRepairAsync, but handles a specified range
 */
public int forceRepairRangeAsync(String beginToken, String endToken, final 
String keyspaceName, boolean isSequential, boolean isLocal, final String... 
columnFamilies);
(defined in StorageServiceMBean.java) to trigger a repair in Cassandra 2.0.17

But this interface is marked as "@Deprecated" in 2.2.3 and has following 
prototype:
@Deprecated
public int forceRepairRangeAsync(String beginToken, String endToken, String 
keyspaceName, boolean isSequential, boolean isLocal, boolean repairedAt, 
String... columnFamilies);

So my questions are:

1.   If we continue to use this interface, should we set the 'repairedAt' 
to true or false?

2.   If we don't use this interface, which alternative API should we use?

Thanks

Boying




Re: Questions to StorageServiceMBean.forceRepaireRangeAsync()

2015-11-30 Thread Paulo Motta
Hello Boying,

1. "repairedAt" actually means "fullRepair", so set that to true if you
want to run ordinary/full repair or false if you want to run incremental
repair.
2. You should use StorageServiceMBean.repairAsync(String, Map), where the options map will be parsed by
org.apache.cassandra.repair.messages.RepairOption.parse()

I will add a deprecation message and rename the repairedAt field to
fullRepair.

Thanks!

Paulo

2015-11-30 2:13 GMT-08:00 Lu, Boying :

> Hi, All,
>
>
>
> We plan to upgrade Cassandra from 2.0.17 to the latest release 2.2.3 in
> our product.
>
>
>
> We use:
>
> /**
>
>  * Same as forceRepairAsync, but handles a specified range
>
>  */
>
> public int forceRepairRangeAsync(String beginToken, String endToken, final
> String keyspaceName, boolean isSequential, boolean *isLocal*, final
> String... columnFamilies);
>
> (defined in StorageServiceMBean.java) to trigger a repair in Cassandra
> 2.0.17
>
>
>
> But this interface is marked as “@Deprecated” in 2.2.3 and has following
> prototype:
>
> @Deprecated
>
> public int forceRepairRangeAsync(String beginToken, String endToken,
> String keyspaceName, boolean isSequential, boolean isLocal, boolean
> *repairedAt*, String... columnFamilies);
>
>
>
> So my questions are:
>
> 1.   If we continue to use this interface, should we set the
> ‘repairedAt’ to true or false?
>
> 2.   If we don’t use this interface, which alternative API should we
> use?
>
>
>
> Thanks
>
>
> Boying
>
>
>
>
>


Re: Moving SSTables from one disk to another

2015-11-30 Thread S C
Rob,


It is inevitable that the repairs are needed to keep consistency guarantees. Is 
it worthwhile to consider RAID-0 as we get more storage? One can treat loss of 
disk as loss of node and rebuild the node and repair. Any other suggestions are 
most welcome.


-Sri

From: Robert Coli 
Sent: Friday, April 10, 2015 6:51 PM
To: user@cassandra.apache.org
Subject: Re: Moving SSTables from one disk to another

On Fri, Apr 10, 2015 at 4:30 PM, Jonathan Haddad 
> wrote:
However, it was pointed out to me that
https://issues.apache.org/jira/browse/CASSANDRA-6696 will be a better
solution in a lot of cases.

Thank you for the interesting link about a theoretical usage which would make 
JBOD worth using.

But I really don't understand why we consider the use of the current JBOD ok, 
when :

"In JBOD, when someone gets a bad drive, the bad drive is replaced with a new 
empty one and repair is run. This can cause deleted data to come back in some 
cases."

This class of issue is permanently fatal to consistency for the affected data.

Why are we encouraging people to expose themselves to this class of issue? What 
benefit do they get from current JBOD implementation that is worth this risk to 
consistency?

Yes, it's true that if an operator in this case never creates tombstones or 
never runs repair after losing only one disk, they're not exposed to the risk. 
But when they configure JBOD, the entire point is that they hope to run repair 
after losing only one disk, instead of rebuilding the entire node. The status 
quo seems to set up operators for failure when they attempt to do what the 
feature claims to be useful for.

I don't get "features" like this : questionable benefit, measurable risk, known 
serious issues and yet they sit there in the product for years on end, daring 
someone to use them...

=Rob


Re: handling down node cassandra 2.0.15

2015-11-30 Thread Robert Coli
On Wed, Nov 18, 2015 at 6:16 AM, Anuj Wadehra 
wrote:

> Suppose, gc grace seconds=10days, max hinted handoff period=3 hrs, 3 nodes
> are there A,B & C,RF =3 and my client is reading at CL ONE. C remains down
> for 5 hours and misses many updates including those which happened after
> max hinted handoff period of 3 hrs. Now I bring back node C with
> auto_bootstrap false and run repair. If client queries at CL ONE and
> fetches a row which got updated after max hinted handoff period, there is a
> very high possibility of client returning stale data  from node C . But as
> soon as node C has joined the ring, it will start participating in WRITEs.
>
> But if we follow the procedure you suggested, node C will come back, run
> repair but wont participate in reads till we join it to the cluster. During
> repair, if client queries at CL ONE and fetches a row which got updated
> after max hinted handoff period expired and was missed by node C, it will
> still get latest data from A and B. So, the integrity of data is not lost
> similar to the case when we auto_bootsrap with true. Additionally we save
> the unique data of node C. While repair is going on, node C will get all
> the Writes.
>

Yes, during this time, C is getting "extra" writes as it is repairing
itself vis a vis A and B, but it is not serving reads.

=Rob


Re: Moving SSTables from one disk to another

2015-11-30 Thread Robert Coli
On Mon, Nov 30, 2015 at 11:29 AM, S C  wrote:

> It is inevitable that the repairs are needed to keep consistency
> guarantees. Is it worthwhile to consider RAID-0 as we get more storage? One
> can treat loss of disk as loss of node and rebuild the node and repair. Any
> other suggestions are most welcome.
>
It depends on whether you consider decreasing "unique replica count" to be
acceptable.

Rebuilding node C from the contents of A and B by definition loses any data
that was only successfully written to C.

In practice, the Coli Conjecture suggests you probably don't care about
decreasing unique replica count if you're, for example, using
ConsistencyLevel.ONE...

=Rob


Re: Huge ReadStage Pending tasks during startup

2015-11-30 Thread Robert Coli
On Fri, Nov 27, 2015 at 2:52 AM, Vasiliy I Ozerov 
wrote:

> We have some strange troubles with cassandra startup. Cluster consists of
> 4 nodes. 32 Gb RAM per node, each node has about 30Gb of data, 8 CPU.
>
> So, just after start it has 2753202 pending readstage tasks. And it takes
> about 11 hours to complete them all.
>

I'd presume that it's doing some wacky cache warming, but have no idea what?

=Rob


Re: Cassandra Cleanup and disk space

2015-11-30 Thread Robert Coli
On Thu, Nov 26, 2015 at 12:55 AM, Luigi Tagliamonte 
wrote:

> I'd like to understand what cleanup does on a running cluster when there
> is no cluster topology change, i did a test and i saw the cluster disk
> space shrink of 200GB.
>

"writes out files 1:1 with their input files"

IIRC it does not delete single-sstable tombstones, but it might... ?

Most likely is what other poster said, that you didn't run cleanup at some
time in the past.

=Rob


[no subject]

2015-11-30 Thread Jay Reddy



Re: Running sstableloader from every node when migrating?

2015-11-30 Thread anuja jain
Hello George,
You can use sstable2json to create the json of your keyspace and then load
this json to your keyspace in new cluster using json2sstable utility.

On Tue, Dec 1, 2015 at 3:06 AM, Robert Coli  wrote:

> On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos 
> wrote:
>
>> We would like to migrate one keyspace from a 6-node cluster to a 3-node
>> one.
>>
>
> http://www.pythian.com/blog/bulk-loading-options-for-cassandra/
>
> =Rob
>
>


Re: Help diagnosing performance issue

2015-11-30 Thread Ryan Svihla
Honestly 20ms for spinning disks is really good, so I think you're just
dealing with the reality of having a certain percentage of your reads off
disk and not in memory. If you're reading data that is on older SSTables
and you're out of buffer cache I'm not sure how else you could improve that.

Sounds like a physics problem to me.

On Wed, Nov 25, 2015 at 10:05 AM, Antoine Bonavita 
wrote:

> Sebastian (and others, help is always appreciated),
>
> After 24h OK, read latencies started to degrade (up to 20ms) and I had to
> ramp down volumes again.
>
> The degradation is clearly linked to the number read IOPs which went up to
> 1.65k/s after 24h.
>
> If anybody can give me hints on what I should look at, I'm very happy to
> do so.
>
> A.
>
>
> On 11/23/2015 12:07 PM, Antoine Bonavita wrote:
>
>> Sebastian,
>>
>> I tried to ramp up volume with this new setting and ran into the same
>> problems.
>>
>> After that I restarted my nodes. This pretty much instantly got read
>> latencies back to normal (< 5ms) on the 32G nodes.
>>
>> I am currently ramping up volumes again and here is what I am seeing on
>> 32G nodes:
>> * Read latencies are OK (<5ms)
>> * A lot of read IOPS (~ 400 read/s)
>> * I enabled logging for the DateCompactionStrategy and I get only this
>> kind of lines :
>> DEBUG [CompactionExecutor:186] 2015-11-23 12:02:45,915
>> DateTieredCompactionStrategy.java:137 - Compaction buckets are []
>> DEBUG [CompactionExecutor:186] 2015-11-23 12:03:16,704
>> DateTieredCompactionStrategy.java:137 - Compaction buckets are
>>
>> [[BigTableReader(path='/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-6452-big-Data.db')]]
>>
>> * When I run pcstats I still get about 100 *-Data.db files loaded at 15%
>> (which is what I was seeing with max_sstable_age_days set at 5).
>>
>> I'm really happy with the first item in my list but the other items seem
>> to indicate something is still wrong and it does not look like it's
>> compaction.
>>
>> Any help would be truly appreciated.
>>
>> A.
>>
>> On 11/20/2015 12:58 AM, Antoine Bonavita wrote:
>>
>>> Sebastian,
>>>
>>> I took into account your suggestion and set max_sstable_age_days to 1.
>>>
>>> I left the TTL at 432000 and the gc_grace_seconds at 172800. So, I
>>> expect SSTable older than 7 days to get deleted. Am I right ?
>>>
>>> I did not change dclocal_read_repair_chance because I have only one DC
>>> at this point in time. Did you mean that I should set read_repair_chance
>>> to 0 ?
>>>
>>> Thanks again for your time and help. Really appreciated.
>>>
>>> A.
>>>
>>>
>>> On 11/19/2015 02:36 AM, Sebastian Estevez wrote:
>>>
 When you say drop you mean reduce the value (to 1 day for example),
 not "don't set the value", right ?


 Yes.

 If I set max sstable age days to 1, my understanding is that
 SSTables with expired data (5 days) are not going to be compacted
 ever. And therefore my disk usage will keep growing forever. Did I
 miss something here ?


 We will expire sstables who's highest TTL is beyond gc_grace_seconds as
 of CASSANDRA-5228
 . This is nice
 because the sstable is just dropped for free, no need to scan it and
 remove tombstones which is very expensive and DTCS will guarantee that
 all the data within an sstable is close together in time.

 So, if I set max sstable age days to 1, I have to run repairs at
 least once a day, correct ?

 I'm afraid I don't get your point about painful compactions.


 I was referring to the problems described here CASSANDRA-9644
 




 All the best,


 datastax_logo.png 

 Sebastián Estévez

 Solutions Architect |954 905 8615 | sebastian.este...@datastax.com
 

 linkedin.png facebook.png
 twitter.png
 g+.png
 <
 http://feeds.feedburner.com/datastax>


 


 


 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to
 any size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Wed, Nov 18, 2015 at 5:53 PM, Antoine Bonavita > wrote: