Re: DataStax Spark driver performance for analytics workload

2017-10-10 Thread Stone Fang
@kurt greaves

doubt that need to read all the data.it is common that there are so many
records in cassandra cluster.
if loading all the data,how to analyse?

On Mon, Oct 9, 2017 at 9:49 AM, kurt greaves  wrote:

> spark-cassandra-connector will provide the best way to achieve what you
> want, however under the hood it's still going to result in reading all the
> data, and because of the way Cassandra works it will essentially read the
> same SSTables multiple times from random points. You might be able to tune
> to make this not super bad, but pretty much reading all the data is going
> to have horrible implications for the cache if all your data doesn't fit in
> memory regardless of what you do.​
>


cassandra unit test

2016-09-05 Thread Stone Fang
Call QueryProcessor.execute method to insert data into table in cassandra
unit test file.

  public static UntypedResultSet execute(String query, ConsistencyLevel cl,
Object... values)
throws RequestExecutionException
{
return execute(query, cl, internalQueryState(), values);
}

As it's consistency level is local_quorum.So I can't call CQLTest.execute
method.

But I got this exception when insert data with this method.
I understand this is an error related to token range.
in ReplicationParams.validate method it calls
 TokenMetadata tmd = StorageService.instance.getTokenMetadata();
.
and tokenMetadata.sortedTokens()  is empty.and there is the cause of
exception.

I want to know how to assign the token range in cassandra unit test to
solve this problem.
anyone have idea about this?

Stone ,thanks

---trace
--

 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at org.apache.cassandra.locator.TokenMetadata.firstToken(
TokenMetadata.java:960)
at org.apache.cassandra.locator.AbstractReplicationStrategy.
getNaturalEndpoints(AbstractReplicationStrategy.java:107)
at org.apache.cassandra.service.StorageService.getNaturalEndpoints(
StorageService.java:3006)
at org.apache.cassandra.service.StorageProxy.performWrite(
StorageProxy.java:887)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:537)
at org.apache.cassandra.service.StorageProxy.mutateWithTriggers(
StorageProxy.java:718)
at org.apache.cassandra.cql3.statements.ModificationStatement.
executeWithoutCondition(ModificationStatement.java:431)
at org.apache.cassandra.cql3.statements.ModificationStatement.execute(
ModificationStatement.java:417)
at org.apache.cassandra.cql3.QueryProcessor.execute(QueryProcessor.java:296)
at org.apache.cassandra.cql3.QueryProcessor.execute(QueryProcessor.java:287)


Re: cassandra database design

2016-09-01 Thread Stone Fang
Thanks,Carlos.
the key point is how to balance the data spread around the cluster and the
partition number of query.
it is hard to determine which is best.anyway,thanks for your suggestion.it
help me a lot.

stone

On Thu, Sep 1, 2016 at 4:54 PM, Carlos Alonso <i...@mrcalonso.com> wrote:

> I guess there's no easy solution for this. The bucketing technique you
> were applying with the publish_pre extra field making a composite partition
> key is probably your best bet but you're right being concerned that all
> your workload will hit the same node during an hour.
>
> I'd then suggest adding a higher-cardinality extra field, as the second
> number for example. That will spread the load across 60 partitions per
> datacenter and, of course, when querying, you'll have to try all
> partitions. It won't scale forever as the partitions will grow big at some
> point, but if your data is small enough it may work.
>
> Another suggestion would be to use the number of seconds since epoch as
> the extra partitioning key. Instead of querying a range you'll have to
> issue single partition queries for each second within your range. This
> solution may suppose heavier read workloads but will definitely scale as
> the size of the partitions shouldn't be an issue.
>
> I don't think this suggestions will match your requirements off the shelf,
> but hopefully will give you an idea on how to trade off between partition
> size, number of partitions and read strategy to find the sweet spot for
> your use case.
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 1 September 2016 at 02:58, Stone Fang <cnstonef...@gmail.com> wrote:
>
>> access pattern is
>>
>> select *from datacenter where datacentername = '' and publish>$time and
>> publish<$time
>>
>> On Wed, Aug 31, 2016 at 8:37 PM, Carlos Alonso <i...@mrcalonso.com>
>> wrote:
>>
>>> Maybe a good question could be:
>>>
>>> Which is your access pattern to this data?
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 31 August 2016 at 11:47, Stone Fang <cnstonef...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>> have some questions on how to define clustering key.
>>>>
>>>> have a table like this
>>>>
>>>> CREATE TABLE datacenter{
>>>>
>>>> datacentername varchar,
>>>>
>>>> publish timestamp,
>>>>
>>>> value varchar,
>>>>
>>>> PRIMARY KEY(datacentername,publish)
>>>>
>>>> }
>>>>
>>>>
>>>> *issues:*
>>>> there are only two datacenter,so the data would only have two
>>>> partitions.and store
>>>> in two nodes.want to spread the data evenly around the cluster.
>>>>
>>>> take this post for reference
>>>> http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
>>>>
>>>> CREATE TABLE datacenter{
>>>>
>>>> datacentername varchar,
>>>>
>>>> publish_pre text,
>>>>
>>>> publish timestamp,
>>>>
>>>> value varchar,
>>>>
>>>> PRIMARY KEY((datacentername,publish_pre),publish)
>>>>
>>>> }
>>>>
>>>> publish_pre is from 1~12 hours.*but the workload is high.i dont want
>>>> to all workload inserted into one node in a hour.*
>>>>
>>>> have no idea on how to define the partition key to spread data evenly
>>>> around the cluster,and the partition not split by time.which means that
>>>> data should not be inserted one node at a certain time window.
>>>>
>>>> thanks
>>>> stone
>>>>
>>>
>>>
>>
>


Re: cassandra database design

2016-08-31 Thread Stone Fang
access pattern is

select *from datacenter where datacentername = '' and publish>$time and
publish<$time

On Wed, Aug 31, 2016 at 8:37 PM, Carlos Alonso <i...@mrcalonso.com> wrote:

> Maybe a good question could be:
>
> Which is your access pattern to this data?
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 31 August 2016 at 11:47, Stone Fang <cnstonef...@gmail.com> wrote:
>
>> Hi all,
>> have some questions on how to define clustering key.
>>
>> have a table like this
>>
>> CREATE TABLE datacenter{
>>
>> datacentername varchar,
>>
>> publish timestamp,
>>
>> value varchar,
>>
>> PRIMARY KEY(datacentername,publish)
>>
>> }
>>
>>
>> *issues:*
>> there are only two datacenter,so the data would only have two
>> partitions.and store
>> in two nodes.want to spread the data evenly around the cluster.
>>
>> take this post for reference
>> http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
>>
>> CREATE TABLE datacenter{
>>
>> datacentername varchar,
>>
>> publish_pre text,
>>
>> publish timestamp,
>>
>> value varchar,
>>
>> PRIMARY KEY((datacentername,publish_pre),publish)
>>
>> }
>>
>> publish_pre is from 1~12 hours.*but the workload is high.i dont want to
>> all workload inserted into one node in a hour.*
>>
>> have no idea on how to define the partition key to spread data evenly
>> around the cluster,and the partition not split by time.which means that
>> data should not be inserted one node at a certain time window.
>>
>> thanks
>> stone
>>
>
>


cassandra database design

2016-08-31 Thread Stone Fang
Hi all,
have some questions on how to define clustering key.

have a table like this

CREATE TABLE datacenter{

datacentername varchar,

publish timestamp,

value varchar,

PRIMARY KEY(datacentername,publish)

}


*issues:*
there are only two datacenter,so the data would only have two
partitions.and store
in two nodes.want to spread the data evenly around the cluster.

take this post for reference
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

CREATE TABLE datacenter{

datacentername varchar,

publish_pre text,

publish timestamp,

value varchar,

PRIMARY KEY((datacentername,publish_pre),publish)

}

publish_pre is from 1~12 hours.*but the workload is high.i dont want to all
workload inserted into one node in a hour.*

have no idea on how to define the partition key to spread data evenly
around the cluster,and the partition not split by time.which means that
data should not be inserted one node at a certain time window.

thanks
stone


Hintedhandoff mutation

2016-08-17 Thread Stone Fang
Hi All,

I want to differ hintedhandoff mutation and normal write mutation when i
receive a mutation.

how to get this in cassandra source code.have not found any attribute about
this in Mutation class.

or there is no way to get this.


thanks
stone


Re: a solution of getting cassandra cross-datacenter latency at a certain time

2016-08-08 Thread Stone Fang
thanks for you response.

@Ryan Svihla,yeah you are right,there will be a ttl.only store most recent
data.

@Chris
good idea on getting the  instantaneous  cross-datacenter latency.but seems
still have issues.
1.based on application insert/update record,not a regular generate
record.once the network between application server and cassandra is bad,
cannot capture the latency value,because casssandra dont know it is the
reason that application have not sent record.or request fail because of the
large latency.

2.as you suggested,it seems that cannot get the latency at a certain time
in recent.
network latency always stable,but when workload are heavy and replication
cross datacenter need to wait.and it will increase cross-datacenter latency.
so may be there is a large difference between this minute and next minute

stone

On Mon, Aug 8, 2016 at 9:10 PM, Chris Lohfink <clohfin...@gmail.com> wrote:

> If you invoke the values operation on the mbean every minute (or whatever
> period) you can get a histogram of the cross dc the latencies. Just keep
> track of the values of each bin in histogram and look at the delta from
> previous time to the current time to find how many latencies occurred in
> each bins range during the period.
>
> Also can wait for CASSANDRA-11752
> <https://issues.apache.org/jira/browse/CASSANDRA-11752> for the a
> "recent" histogram (although would need to apply it to this histogram as
> well).
>
> Chris Lohfink
>
> On Mon, Aug 8, 2016 at 8:50 AM, Ryan Svihla <r...@foundev.pro> wrote:
>
>> The first issue I can think of is the Latency table, if I understand you
>> correctly, has an unbounded size for the partition key of DC and will over
>> time just get larger as more measurements are recorded.
>>
>> Regards,
>>
>> Ryan Svihla
>>
>> On Aug 8, 2016, at 2:58 AM, Stone Fang <cnstonef...@gmail.com> wrote:
>>
>> *objective*:get cassandra cross-datacenter latency in time
>>
>> *existing ticket:*
>>
>> there is a ticket [track cross-datacenter latency](https://issues.apache
>> .org/jira/browse/CASSANDRA-11569)
>> but it is a statistics value from node starting,i want to get the
>> instantaneous value in a certain time.
>>
>> *thought*
>>
>> want to write a message into **MESSAGE TABLE** in 1s timer task(the
>> period is similar to most of cross datacenter latency )
>> ,and replicate to other datacenter,there will be a delay.and I capture
>> it,and write to **LATENCY TABLE**.i can query the latency value from this
>> table with the condition of certain time.
>>
>> *schema*
>>
>> message table for replicating data cross datacenter
>>
>>
>> create keyspace heartbeat with replication=
>> {'class':'NetworkTopologyStrategy','dc1':1, 'dc2':1...};
>>
>>
>>
>>  CREATE TABLE HEARTBEAT.MESSAGE{
>> CREATED TIMESTAMP,
>> FROMDC VARCHAR,
>> PRIMARY KEY(CREATED,FROMDC)
>> }
>>
>> latency Table for querying latency value
>>
>>  CREATE TABLE SYSTEM.LATENCY{
>>  FROMDC VARCHAR,
>>  ARRIVED TIMESTAMP,
>>  CREATED TIMESTAMP,
>>  LANTENCY BIGINT
>>  PRIMARY KEY(FROMDC,ARRIVED)
>> }WITH CLUSTERING ORDER BY(ARRIVED DESC);
>>
>> problems
>>
>> 1.can this solution work to get the cross-datacenter latency?
>>
>>
>> 2.create heartbeat keyspace in cassandra bootstrap process,i need to load
>> Heartbeat keyspace in Scheam.java.and save this keyspace into SystemSchema.
>> also need to check if this keyspace has exist after first node start.so i
>> think this is not a good solution.
>>
>> 3.compared to 1,try another solution.generate heartbeat message in a
>> standalone jar.but always i need to capture heartbeat message mutation in
>> cassandra.so i need to check if the mutation is about heartbeat message.and
>> it seems strange that check the heartbeat keyspace which is not defined in
>> cassandra,but third-party.
>>
>> hope to see your thought on this.
>> thanks
>> stone
>>
>>
>


a solution of getting cassandra cross-datacenter latency at a certain time

2016-08-08 Thread Stone Fang
*objective*:get cassandra cross-datacenter latency in time

*existing ticket:*

there is a ticket [track cross-datacenter latency](
https://issues.apache.org/jira/browse/CASSANDRA-11569)
but it is a statistics value from node starting,i want to get the
instantaneous value in a certain time.

*thought*

want to write a message into **MESSAGE TABLE** in 1s timer task(the period
is similar to most of cross datacenter latency )
,and replicate to other datacenter,there will be a delay.and I capture
it,and write to **LATENCY TABLE**.i can query the latency value from this
table with the condition of certain time.

*schema*

message table for replicating data cross datacenter


create keyspace heartbeat with replication=
{'class':'NetworkTopologyStrategy','dc1':1, 'dc2':1...};



 CREATE TABLE HEARTBEAT.MESSAGE{
CREATED TIMESTAMP,
FROMDC VARCHAR,
PRIMARY KEY(CREATED,FROMDC)
}

latency Table for querying latency value

 CREATE TABLE SYSTEM.LATENCY{
 FROMDC VARCHAR,
 ARRIVED TIMESTAMP,
 CREATED TIMESTAMP,
 LANTENCY BIGINT
 PRIMARY KEY(FROMDC,ARRIVED)
}WITH CLUSTERING ORDER BY(ARRIVED DESC);

problems

1.can this solution work to get the cross-datacenter latency?


2.create heartbeat keyspace in cassandra bootstrap process,i need to load
Heartbeat keyspace in Scheam.java.and save this keyspace into SystemSchema.
also need to check if this keyspace has exist after first node start.so i
think this is not a good solution.

3.compared to 1,try another solution.generate heartbeat message in a
standalone jar.but always i need to capture heartbeat message mutation in
cassandra.so i need to check if the mutation is about heartbeat message.and
it seems strange that check the heartbeat keyspace which is not defined in
cassandra,but third-party.

hope to see your thought on this.
thanks
stone


Re: Exclude a host from the repair process

2016-07-21 Thread Stone Fang
@Alain now we agree with on this point "the repair continuing for ranges
not involving the dead node"
 and we agree with that it will fail for ranges involve the down node.

actually i dont think this is a good idea of “those ranges including the
dead node could be repaired between the 2 other replicas that are up
(considering RF=3)”.
we dont know which is the latest record from two replicates,(the down node
may be the right record)so even if we repair,we just get the latest record
of these two replicates,but not the right one.


On Thu, Jul 21, 2016 at 5:51 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> what I mean is repiar will ignore all the replica data that stored in down
>> node.and continue to repair  other data.
>> once the down node up,then repair can work on the replica data belong to
>> the down node.
>>
>
> That's what I understood the first time, but thanks for trying to clarify
> :-). I am saying that what I saw in the past was the repair continuing for
> ranges not involving the dead node as you said, but all the failed ranges
> will not be automatically resumed unless you use some external tool as
> OpsCenter repair feature or Spotify Reaper (I believe both handle this by
> putting to the queue all the ranges that fail to be re-run, I might be
> wrong).
>
> If a node is down in my cluster.
>>
>> Is it possible to exclude him from the repair process in order to
>> continue with the repair?
>> If not
>> Is the repair continue reparing the other replicas even if one is down?
>>
>
> It is exactly what this ticket
> https://issues.apache.org/jira/browse/CASSANDRA-10446 is about. Those
> ranges including the dead node could be repaired between the 2 other
> replicas that are up (considering RF=3). The work is to be done, you can
> follow that, vote for it or even develop it if it is something that is
> worth it for you Jean.
>
> C*heers
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> 2016-07-21 10:28 GMT+02:00 Stone Fang <cnstonef...@gmail.com>:
>
>> Hi
>> @Alain Rodriguez.what I mean is repiar will ignore all the replica data
>> that stored in down node.and continue to repair  other data.
>> once the down node up,then repair can work on the replica data belong to
>> the down node.
>>
>> run repair always a daily work with auto script.we dont need to exclude
>> the down node as there are always down nodes in cluster.
>>
>> thanks
>> stone
>>
>> On Wed, Jul 20, 2016 at 7:20 PM, Amit Singh F <amit.f.si...@ericsson.com>
>> wrote:
>>
>>> Hi Jean,
>>>
>>>
>>>
>>> This option is available in C* version 2.1.x & above, where you can
>>> specify hosts in nodetool  repair command . For more detail please visit
>>> the link below :
>>>
>>>
>>>
>>>
>>> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html
>>>
>>>
>>>
>>> Regards
>>>
>>> Amit Singh
>>>
>>>
>>>
>>> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
>>> *Sent:* Wednesday, July 20, 2016 4:42 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Exclude a host from the repair process
>>>
>>>
>>>
>>> Hi Jean,
>>>
>>>
>>>
>>> All the nodes are not necessary involved in a repair depending on vnodes
>>> being enabled or not, on your topology, on the racks you are using etc.
>>>
>>>
>>>
>>> This being said, if a node was supposed to be part of a repair process,
>>> the repair of all the subranges including the down node will fail. That's
>>> what I have seen happening so far. @Stone Fang, not sure who is right on
>>> this (I might have missed some information about this topic), but there is
>>> a ticket about this topic:
>>> https://issues.apache.org/jira/browse/CASSANDRA-10446. You apparently
>>> can specify which nodes to repair, but a down node is not automatically
>>> ignored as far as I can tell.
>>>
>>>
>>>
>>> C*heers,
>>>
>>> ---
>>>
>>> Alain Rodriguez - al...@thelastpickle.com
>>>
>>> France
>>>
>>>
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>>
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>> 2016-07-14 9:16 GMT+02:00 Stone Fang <cnstonef...@gmail.com>:
>>>
>>> dont think it is necessary to remove the down node.
>>>
>>> the repair will continue comparing with other up node.ignore the down
>>> node.
>>>
>>>
>>>
>>> On Wed, Jul 13, 2016 at 9:44 PM, Jean Carlo <jean.jeancar...@gmail.com>
>>> wrote:
>>>
>>> If a node is down in my cluster.
>>>
>>> Is it possible to exclude him from the repair process in order to
>>> continue with the repair?
>>>
>>> If not
>>>
>>> Is the repair continue reparing the other replicas even if one is down?
>>>
>>> Best regards
>>>
>>>
>>>
>>> Jean Carlo
>>>
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>>
>>>
>>>
>>>
>>
>>
>


multi datacenter improvement

2016-07-21 Thread Stone Fang
Hi All,
I am thinking about the issue of cassandra multi datacenter.
open a ticket to track this.welcome to your point.
https://issues.apache.org/jira/browse/CASSANDRA-12257

*Environment*
active-active cassandra datacenter.
set write consistency level=local_quorum to get a high resquest response.

*Concern*
we dont know the time of the data arrive other datacenter.
we dont know the size of data that need to be transferred to other
datacenter.

*Scenario*
one project need to collect information from sensors,which in different
region.

2 datacenter,DC1,DC2.sensor1,sensor2 in DC1 region.sensor3,sensor4 in DC2.

one client in DC1,pull data every 10 minutes.
1.sensor3 in DC2 write a record to DC2 at 8:59:55 ,and arrived DC1 at
9:00:05
2.client in DC1 pull data at 9:00,it should get the record,but it cannot as
the
record have not arrive DC1.
3.then client in DC1 will pull data at 9:10.it also can not get the record
as it will pull data from 9:00---9:10,but the record created on 8:59:55

so we will miss the record.wee need measure the latency so we can look back
the data.

*Thought*
1.we can get the latency with ping or other monitor tool,but it can not
represent the latency of cassandra data to be transferred from one dc to
another dc.


2.we can measure the latency between dc.but there is a accumulated value
from node started,it cannot represent the latency now.
https://issues.apache.org/jira/browse/CASSANDRA-11569

3.Cassandra may need to insert a record into a system table periodly.

so we can clearly know when the data arrived .

thanks in advance

stone


Re: Exclude a host from the repair process

2016-07-21 Thread Stone Fang
Hi
@Alain Rodriguez.what I mean is repiar will ignore all the replica data
that stored in down node.and continue to repair  other data.
once the down node up,then repair can work on the replica data belong to
the down node.

run repair always a daily work with auto script.we dont need to exclude the
down node as there are always down nodes in cluster.

thanks
stone

On Wed, Jul 20, 2016 at 7:20 PM, Amit Singh F <amit.f.si...@ericsson.com>
wrote:

> Hi Jean,
>
>
>
> This option is available in C* version 2.1.x & above, where you can
> specify hosts in nodetool  repair command . For more detail please visit
> the link below :
>
>
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html
>
>
>
> Regards
>
> Amit Singh
>
>
>
> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Sent:* Wednesday, July 20, 2016 4:42 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Exclude a host from the repair process
>
>
>
> Hi Jean,
>
>
>
> All the nodes are not necessary involved in a repair depending on vnodes
> being enabled or not, on your topology, on the racks you are using etc.
>
>
>
> This being said, if a node was supposed to be part of a repair process,
> the repair of all the subranges including the down node will fail. That's
> what I have seen happening so far. @Stone Fang, not sure who is right on
> this (I might have missed some information about this topic), but there is
> a ticket about this topic:
> https://issues.apache.org/jira/browse/CASSANDRA-10446. You apparently can
> specify which nodes to repair, but a down node is not automatically ignored
> as far as I can tell.
>
>
>
> C*heers,
>
> ---
>
> Alain Rodriguez - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
>
>
> 2016-07-14 9:16 GMT+02:00 Stone Fang <cnstonef...@gmail.com>:
>
> dont think it is necessary to remove the down node.
>
> the repair will continue comparing with other up node.ignore the down node.
>
>
>
> On Wed, Jul 13, 2016 at 9:44 PM, Jean Carlo <jean.jeancar...@gmail.com>
> wrote:
>
> If a node is down in my cluster.
>
> Is it possible to exclude him from the repair process in order to continue
> with the repair?
>
> If not
>
> Is the repair continue reparing the other replicas even if one is down?
>
> Best regards
>
>
>
> Jean Carlo
>
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
>
>
>


CassandraDaemon&

2016-07-19 Thread Stone Fang
Hi All,
there are two questions CassandraDaemon.

1. I do confuse about the EmbededCassandraService.java
think it should provide a CassandraDaemon stop method.find a ticket about
this https://issues.apache.org/jira/browse/CASSANDRA-7595.
but it not accepted in C * 3.x.anyone know why?

2. I have started a cassandradaemon instance with gradle task like this:
  cassandra =new CassandraDaemon();
try {
cassandra.init(null);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
cassandra.completeSetup();
cassandra.startNativeTransport();

it do start,and can connect.
but when the task finished,have not found cassandra instances before run
stop task?
it have deactived,and i can run start cassandra task again.
how to explain this?

thanks
stone


Re: how to start a embed cassandra instance?

2016-07-14 Thread Stone Fang
Achilles is a good project.but it is heavy.actually i just need to
1.start/stop a embed  standalone cassandra
2.start/stop a embed cassandra cluster.

seems CassandraDaemon can just start a standalone,not cluster.


On Wed, Jul 13, 2016 at 8:31 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> As for Achilles, no I start Cassandra in the same JVM. It is meant to be
> used for testing purpose only. I also faced dependency issue with different
> version of Guava so I excluded the Guava pulled by the Datastax Java driver
> to use the one pulled by C* itself:
> https://github.com/doanduyhai/Achilles/blob/master/achilles-embedded/pom.xml#L52-L55
>
>
>
> On Wed, Jul 13, 2016 at 2:28 PM, Ken Hancock <ken.hanc...@schange.com>
> wrote:
>
>> Do either cassandra-unit or Achilles fork Cassandra to a separate JVM?
>> Guava libraries create a dependency hell with our current use of Hector's
>> embedded server.  We're starting to migrate to the Datastax Java driver
>> with yet another guava version.  I know Farsandra supports forking, so that
>> was where I was thinking of going first.
>>
>>
>>
>>
>> On Tue, Jul 12, 2016 at 9:37 AM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> If you're looking something similar to cassandra-unit with Apache 2
>>> licence, there is a module in Achilles project that provides the same
>>> thing:
>>> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
>>>
>>> On Tue, Jul 12, 2016 at 12:56 PM, Peddi, Praveen <pe...@amazon.com>
>>> wrote:
>>>
>>>> We do something similar by starting CassandraDaemon class directly (you
>>>> would need to provide a yaml file though). You can start and stop
>>>> CassandraDaemon class from your unit test (typically @BeforeClass).
>>>>
>>>> Praveen
>>>>
>>>> On Jul 12, 2016, at 3:30 AM, Stone Fang <cnstonef...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> how to start a embed cassandra instance?so we can do a unit test on
>>>> local,dont need to start a
>>>> cassandra server.
>>>>
>>>> https://github.com/jsevellec/cassandra-unit this project is good,but
>>>> the license is not suitable.
>>>> how do you achieve this?
>>>>
>>>> thanks in advance
>>>>
>>>> stone
>>>>
>>>>
>>>
>>
>>
>> --
>> *Ken Hancock *| System Architect, Advanced Advertising
>> SeaChange International
>> 50 Nagog Park
>> Acton, Massachusetts 01720
>> ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
>> <http://www.schange.com/en-US/Company/InvestorRelations.aspx>
>> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
>>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
>> LinkedIn] <http://www.linkedin.com/in/kenhancock>
>>
>> [image: SeaChange International]
>> <http://www.schange.com/>
>> This e-mail and any attachments may contain information which is
>> SeaChange International confidential. The information enclosed is intended
>> only for the addressees herein and may not be copied or forwarded without
>> permission from SeaChange International.
>>
>
>


Re: Exclude a host from the repair process

2016-07-14 Thread Stone Fang
dont think it is necessary to remove the down node.
the repair will continue comparing with other up node.ignore the down node.

On Wed, Jul 13, 2016 at 9:44 PM, Jean Carlo 
wrote:

> If a node is down in my cluster.
>
> Is it possible to exclude him from the repair process in order to continue
> with the repair?
> If not
> Is the repair continue reparing the other replicas even if one is down?
>
> Best regards
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>


how to start a embed cassandra instance?

2016-07-12 Thread Stone Fang
Hi,
how to start a embed cassandra instance?so we can do a unit test on
local,dont need to start a
cassandra server.

https://github.com/jsevellec/cassandra-unit this project is good,but the
license is not suitable.
how do you achieve this?

thanks in advance

stone


archive cassandra data

2016-07-10 Thread Stone Fang
Hi all,

Have some thought on this issue .
https://issues.apache.org/jira/browse/CASSANDRA-8460

but still have not received reply after comment for several days,it is a
old ticket,created on 11/Dec/14.anybody know how to go on this ticket.

thanks in advance!

stone


Re: How to get current value of commitlog_segment_size_in_mb?

2016-07-07 Thread Stone Fang
commitlog_segment_size_in_mb is configed in cassandra.yaml.dont  think that
it wolud be stored in Cassandra system table.
the following is the introduction of Cassandra System table.
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_query_system_c.html


On Fri, Jul 8, 2016 at 4:23 AM, Jaydeep Chovatia  wrote:

> Hi,
>
> In my project I need to read current value for
> "commitlog_segment_size_in_mb", I am looking for CQL query to do this. Any
> idea if this information gets stored in any of the Cassandra system table?
>
> Thanks,
> Jaydeep
>


Re: whats the default .yaml file that cassandra-stress uses

2016-07-07 Thread Stone Fang
there is a simple config file for testing."cqlstress-example.yaml" under
the tool folder.
you can customize this file to achieve your test

stone.

On Thu, Jul 7, 2016 at 4:28 PM, Daiyue Weng  wrote:

> Hi, I am wondering what's the default .yaml file that cassandra-stress
> uses when testing writes and reads when command 'profile=' is not
> specified. Is it the cassandra.yaml? Does it affect the performance of
> cassandra-stress test by modifying/tuning it?
>
> ps.I am running cassandra instances on Linux, the path to
> cassandra.yaml that I found is
>
> /etc/cassandra/cassandra.yaml
>
> thanks
>