Re: how to make unique coloumns in cassandra

2015-03-02 Thread Ajaya Agrawal
Please be clear on questions and spend some time on writing questions so
that other people know what you are trying to ask. I can't read your mind.
:)

Back to your question:
Assuming that you need to search based on the values of the unique column
then invert the index on auxiliary table. So instead of (phone_number,
user_id) index you would have to have (user_id, phone_number) as index.
Then do a query on the auxiliary table and then on user table if you want
other columns. You can replicate other columns in the auxiliary table also
to avoid multiple queries.

Cheers,
Ajaya

On Mon, Mar 2, 2015 at 12:53 PM, Rahul Srivastava <
srivastava.robi...@gmail.com> wrote:

> but what if i want to fetch the value using on table then this idea might
> fail
>
> On Mon, Mar 2, 2015 at 12:46 PM, Ajaya Agrawal  wrote:
>
>> Make a table for each of the unique keys. For e.g.
>>
>> If primary key for user table is user_id and you want the phone number
>> column to be unique then create another table wherein the primary key is
>> (phone_number, user_id). Before inserting to main table try to insert to
>> this table first with "if not exists" clause. If it succeeds then go ahead
>> with your insert to the user table. Similarly while deleting a row from the
>> primary table delete the corresponding row in all other tables. The order
>> of insertion to tables matter here otherwise you would end up inducing race
>> conditions.
>>
>> The catch here is, you should not be updating the unique column ever. If
>> you do that you would have to use locks and if there are multiple nodes
>> running your application then you would need a distributed lock. I would
>> suggest not to update the unique columns. In stead force your users to
>> delete the entry and recreate it. If you can't do that you need to evaluate
>> your choice of database. Perhaps a relational database would be better
>> suited to your requirements.
>>
>> Hope this helps!
>>
>> -Ajaya
>>
>> Cheers,
>> Ajaya
>>
>> On Fri, Feb 27, 2015 at 5:26 PM, ROBIN SRIVASTAVA <
>> srivastava.robi...@gmail.com> wrote:
>>
>>> I want to make unique constraint in cassandra . As i want that all the
>>> value in my column be unique in my column family ex: name-rahul phone-123
>>> address-abc
>>>
>>> now i want that in this row no values equal to rahul ,123 and abc get
>>> inserted again on searching on datastax i found that i can achieve it by
>>> doing query on partition key as IF NOT EXIST ,but not getting the solution
>>> for getting all the three values unique means if name- jacob phone-123
>>> address-qwe
>>>
>>> this should also be not inserted into my database as my phone column has
>>> the same value as i have shown with name-rahul.
>>>
>>
>>
>


Re: How to extract all the user id from a single table in Cassandra?

2015-03-02 Thread Jens Rantil
Hi Check,

Please avoid double posting on mailing lists. It leads to double work
(respect people's time!) and makes it hard for people in the future having
the same issue as you to follow discussions and answers.

That said, if you have a lot of primary keys

select user_id from testkeyspace.user_record;

will most definitely timeout. Have a look at `SELECT DISTINCT` at [1]. More
importantly, for larger datasets you will also need to split the token
space into smaller segments and iteratively select your primary keys. See
[2].

[1]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html
[2]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results

If you are having specific issues with the Java Driver I suggest you ask on
that mailing list (only).

Cheers,
Jens

On Sun, Mar 1, 2015 at 6:38 PM, Check Peck  wrote:

> Sending again as I didn't got any response on this.
>
> Any thoughts?
>
> On Fri, Feb 27, 2015 at 8:24 PM, Check Peck 
> wrote:
>
>> I have a Cassandra table like this -
>>
>> create table user_record (user_id text, record_name text,
>> record_value blob, primary key (user_id, record_name));
>>
>> What is the best way to extract all the user_id from this table? As of
>> now, I cannot change my data model to do this exercise so I need to find a
>> way by which I can extract all the user_id from the above table.
>>
>> I am using Datastax Java driver in my project. Is there any other easy
>> way apart from code to extract all the user_id from the above table through
>> come cqlsh utility and dump it into some file?
>>
>> I am thinking below code might timed out after some time -
>>
>> public class TestCassandra {
>>
>> private Session session = null;
>> private Cluster cluster = null;
>>
>> private static class ConnectionHolder {
>> static final TestCassandra connection = new
>> TestCassandra();
>> }
>>
>> public static TestCassandra getInstance() {
>> return ConnectionHolder.connection;
>> }
>>
>> private TestCassandra() {
>> Builder builder = Cluster.builder();
>> builder.addContactPoints("127.0.0.1");
>>
>> PoolingOptions opts = new PoolingOptions();
>> opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
>> opts.getCoreConnectionsPerHost(HostDistance.LOCAL));
>>
>> cluster =
>> builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts)
>> .withLoadBalancingPolicy(new TokenAwarePolicy(new
>> DCAwareRoundRobinPolicy("PI")))
>> .withReconnectionPolicy(new
>> ConstantReconnectionPolicy(100L))
>> .build();
>> session = cluster.connect();
>> }
>>
>> private Set getRandomUsers() {
>> Set userList = new HashSet();
>>
>> String sql = "select user_id from testkeyspace.user_record;";
>>
>> try {
>> SimpleStatement query = new SimpleStatement(sql);
>> query.setConsistencyLevel(ConsistencyLevel.ONE);
>> ResultSet res = session.execute(query);
>>
>> Iterator rows = res.iterator();
>> while (rows.hasNext()) {
>> Row r = rows.next();
>>
>> String user_id = r.getString("user_id");
>> userList.add(user_id);
>> }
>> } catch (Exception e) {
>> System.out.println("error= " + e);
>> }
>>
>> return userList;
>> }
>> }
>>
>> Adding java-driver group and Cassandra group as well to see whether there
>> is any better way to execute this?
>>
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: using or in select query in cassandra

2015-03-02 Thread Jens Rantil
Hi Rahul,

No, you can't do this in a single query. You will need to execute two
separate queries if the requirements are on different columns. However, if
you'd like to select multiple rows of with restriction on the same column
you can do that using the `IN` construct:

select * from table where id IN (123,124);

See [1] for reference.

[1]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Cheers,
Jens

On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava <
srivastava.robi...@gmail.com> wrote:

> Hi
>  I want to make uniqueness for my data so i need to add OR clause  in my
> WHERE clause.
> ex: select * from table where id =123 OR name ='abc'
> so in above i want that i get data if my id is 123 or my name is abc .
>
> is there any possibility in cassandra to achieve this .
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: how to make unique coloumns in cassandra

2015-03-02 Thread Peter Lin
Use a RDBMS 

There is a reason constraints were created and why Cassandra doesn't have it

Sent from my iPhone

> On Mar 2, 2015, at 2:23 AM, Rahul Srivastava  
> wrote:
> 
> but what if i want to fetch the value using on table then this idea might fail
> 
>> On Mon, Mar 2, 2015 at 12:46 PM, Ajaya Agrawal  wrote:
>> Make a table for each of the unique keys. For e.g.
>> 
>> If primary key for user table is user_id and you want the phone number 
>> column to be unique then create another table wherein the primary key is 
>> (phone_number, user_id). Before inserting to main table try to insert to 
>> this table first with "if not exists" clause. If it succeeds then go ahead 
>> with your insert to the user table. Similarly while deleting a row from the 
>> primary table delete the corresponding row in all other tables. The order of 
>> insertion to tables matter here otherwise you would end up inducing race 
>> conditions.
>> 
>> The catch here is, you should not be updating the unique column ever. If you 
>> do that you would have to use locks and if there are multiple nodes running 
>> your application then you would need a distributed lock. I would suggest not 
>> to update the unique columns. In stead force your users to delete the entry 
>> and recreate it. If you can't do that you need to evaluate your choice of 
>> database. Perhaps a relational database would be better suited to your 
>> requirements.
>> 
>> Hope this helps!
>> 
>> -Ajaya
>> 
>> Cheers,
>> Ajaya
>> 
>>> On Fri, Feb 27, 2015 at 5:26 PM, ROBIN SRIVASTAVA 
>>>  wrote:
>>> I want to make unique constraint in cassandra . As i want that all the 
>>> value in my column be unique in my column family ex: name-rahul phone-123 
>>> address-abc
>>> 
>>> now i want that in this row no values equal to rahul ,123 and abc get 
>>> inserted again on searching on datastax i found that i can achieve it by 
>>> doing query on partition key as IF NOT EXIST ,but not getting the solution 
>>> for getting all the three values unique means if name- jacob phone-123 
>>> address-qwe
>>> 
>>> this should also be not inserted into my database as my phone column has 
>>> the same value as i have shown with name-rahul.
>>> 
> 


Running Cassandra on mixed OS

2015-03-02 Thread SEAN_R_DURITY
Cassandra 1.2.13+/2.0.12

Have any of you run a single Cassandra cluster on a mix of OS (Red Hat 5 and 6, 
for example), but with the same JVM? Any issues or concerns? If there are 
problems, how do you handle OS upgrades?



Sean R. Durity



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Composite Keys in cassandra 1.2

2015-03-02 Thread Kai Wang
AFIK it's not possible. The fact you need to query the data by partial row
key indicates your data model isn't proper. What are your typical queries
on the data?

On Sun, Mar 1, 2015 at 7:24 AM, Yulian Oifa  wrote:

> Hello to all.
> Lets assume a scenario where key is compound type with 3 types in it (
> Long , UTF8, UTF8 ).
> Each row stores timeuuids as column names and empty values.
> Is it possible to retreive data by single key part ( for example by long
> only ) by using java thrift?
>
> Best regards
> Yulian Oifa
>
>
>


Re: Optimal Batch size (Unlogged) for Java driver

2015-03-02 Thread Ajay
I have a column family with 15 columns where there are timestamp,
timeuuid,  few text fields and rest int  fields.  If I calculate the size
of its column name  and it's value and divide 5kb (recommended max size for
batch) with the value,  I get result as 12. Is it correct?. Am I missing
something?

Thanks
Ajay
On 02-Mar-2015 12:13 pm, "Ankush Goyal"  wrote:

> Hi Ajay,
>
> I would suggest, looking at the approximate size of individual elements in
> the batch, and based on that compute max size (chunk size).
>
> Its not really a straightforward calculation, so I would further suggest
> making that chunk size a runtime parameter that you can tweak and play
> around with until you reach stable state.
>
> On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote:
>>
>> Hi,
>>
>> I am looking at a way to compute the optimal batch size in the client
>> side similar to the below mentioned bug in the server side (generic as we
>> are exposing REST APIs for Cassandra, the column family and the data are
>> different each request).
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6487
>> 
>>
>> How do we compute(approximately using ColumnDefintions or ColumnMetadata)
>> the size of a row of a column family from the client side using Cassandra
>> Java driver?
>>
>> Thanks
>> Ajay
>>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to java-driver-user+unsubscr...@lists.datastax.com.
>


Re: Optimal Batch size (Unlogged) for Java driver

2015-03-02 Thread Ajay
Hi Ankush,

We are already using Prepared statement and our case is a time series data
as well.

Thanks
Ajay
On 02-Mar-2015 10:00 pm, "Ankush Goyal"  wrote:

> Ajay,
>
> First of all, I would recommend using PreparedStatements, so you only
> would be sending the variable bound arguments over the wire. Second, I
> think that 5kb limit for WARN is too restrictive, and you could tune that
> on cassandra server side. I think if all you have is 15 columns (as long as
> their values are sanitized and do not go over certain limits), it should be
> fine to send all of them over at the same time. Chunking is necessary, when
> you have time-series type data (for writes) OR you might be reading a lot
> of data via IN query.
>
> On Monday, March 2, 2015 at 7:55:18 AM UTC-8, Ajay Garga wrote:
>>
>> I have a column family with 15 columns where there are timestamp,
>> timeuuid,  few text fields and rest int  fields.  If I calculate the size
>> of its column name  and it's value and divide 5kb (recommended max size for
>> batch) with the value,  I get result as 12. Is it correct?. Am I missing
>> something?
>>
>> Thanks
>> Ajay
>> On 02-Mar-2015 12:13 pm, "Ankush Goyal"  wrote:
>>
>>> Hi Ajay,
>>>
>>> I would suggest, looking at the approximate size of individual elements
>>> in the batch, and based on that compute max size (chunk size).
>>>
>>> Its not really a straightforward calculation, so I would further suggest
>>> making that chunk size a runtime parameter that you can tweak and play
>>> around with until you reach stable state.
>>>
>>> On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote:

 Hi,

 I am looking at a way to compute the optimal batch size in the client
 side similar to the below mentioned bug in the server side (generic as we
 are exposing REST APIs for Cassandra, the column family and the data are
 different each request).

 https://issues.apache.org/jira/browse/CASSANDRA-6487
 

 How do we compute(approximately using ColumnDefintions or
 ColumnMetadata) the size of a row of a column family from the client side
 using Cassandra Java driver?

 Thanks
 Ajay

>>>  To unsubscribe from this group and stop receiving emails from it, send
>>> an email to java-driver-us...@lists.datastax.com.
>>>
>>  To unsubscribe from this group and stop receiving emails from it, send
> an email to java-driver-user+unsubscr...@lists.datastax.com.
>


Re: using or in select query in cassandra

2015-03-02 Thread Jonathan Haddad
I'd like to add that in() is usually a bad idea.  It is convenient, but not
really what you want in production.  Go with Jens' original suggestion of
multiple queries.

I recommend reading Ryan Svihla's post on why in() is generally a bad
thing:
http://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

On Mon, Mar 2, 2015 at 12:36 AM Jens Rantil  wrote:

> Hi Rahul,
>
> No, you can't do this in a single query. You will need to execute two
> separate queries if the requirements are on different columns. However, if
> you'd like to select multiple rows of with restriction on the same column
> you can do that using the `IN` construct:
>
> select * from table where id IN (123,124);
>
> See [1] for reference.
>
> [1]
> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html
>
> Cheers,
> Jens
>
> On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava <
> srivastava.robi...@gmail.com> wrote:
>
>> Hi
>>  I want to make uniqueness for my data so i need to add OR clause  in my
>> WHERE clause.
>> ex: select * from table where id =123 OR name ='abc'
>> so in above i want that i get data if my id is 123 or my name is abc .
>>
>> is there any possibility in cassandra to achieve this .
>>
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>


Datastax Agent 5.1+ Configuration

2015-03-02 Thread Robert Halstead
I recently attempted to get our cassandra instances talking securely to one
another with ssl opscenter communication.  We are using DSE 4.6, opscenter
5.1.  While a lot of the datastax documentation is fairly good, when it
comes to advanced configuration topics or security configuration, I find
the docs very lacking.

I setup a 3 node cluster with SSL encryption between nodes and
PasswordAuthentication turned on.  As it being obvious, you need to setup
the user/pass in the agent configuration as well.  These used to be
thrift_user and thrift_pass (or something along those lines) and the ssl
was thrift_keystore / thrift_truststore, etc..

In Opscenter 5.1, the system changed from using thrift to the native
interface.  However there is nothing in the docs about what agent
properties do you need to set for the ssl security and authentication.

After my dealings with Datastax Support, I thought I would post this here
until they update their documentation.

Agent configuration (address.yaml)

C* connection options

*IP addresses

Before 5.1, we were using either thrift_rpc_interface (when storing
metrics/settings in the same cluster) or storage_thrift_hosts
(separate cluster) to determine what IP to use to connect to C*. In
5.1, both options were replaced with hosts, that accepts an array of
strings (including an array w/ a single string for the same cluster
case) instead of a single string:

hosts: ["123.234.111.11", "10.1.1.1"]

C* port
storage_thrift_port was removed, thrift_port was supplemented by cassandra_port

C* autodiscovery
autodiscovery_enabled, autodiscovery_interval, and storage_dc were
removed, autodiscovery can’t really be disabled for our java-driver,
but we never connect to hosts that are not specified in the agent’s
config.

Misc
thrift_socket_timeout and thrift_conn_timeout were removed.

C*/DSE security
PLAINTEXT AUTH
thrift_user, storage_thrift_user, thift_pass, and storage_thrift_pass
were replaced by cassandra_user & cassandra_pass

ENCRYPTION
thrift_ssl_truststore and thrift_ssl_truststore_password were replaced
by ssl_keystore and ssl_keystore_password, respectively.
thrift_ssl_truststore_type, thrift_max_frame_size were removed.

KERBEROS
We completely changed the way we setup kerberos (I thought it was
doc’d but apparently it wasn’t). We removed everything
kerberos-related from the config except for a single option,
kerberos_service. When it’s set (to the Kerberos service name) we’re
using kerberos. All the configuration takes place in the
kerberos.config file.
opscenterd cluster configs

[cassandra]
send_thrift_rpc was renamed to be thrift_rpc

[agents]
thrift_ssl_truststore and thrift_ssl_truststore_password were renamed
to ssl_keystore and ssl_keystore_password, respectively.
thrift_ssl_truststore_type was removed.

Hopefully this will be helpful for those running the latest opscenter and
want a secure setup.

Thanks to datastax for the help in this matter.


Re: using or in select query in cassandra

2015-03-02 Thread Robert Wille
I would also like to add that if you avoid IN and use async queries instead, it 
is pretty trivial to use a semaphore or some other limiting mechanism to put a 
ceiling on the amount on concurrent work you are sending to the cluster. If you 
use a query with an IN clause with a thousand things, you’ll make the cluster 
look for a thousand records concurrently. If you issue a thousand asyncQueries, 
and use a limiting mechanism, then you can control how much load you are 
placing on the server.

I built a nice wrapper around the Session object, and one of the things that is 
built into the wrapper is the ability to limit the number of concurrent async 
queries. It’s a really nice and simple feature to have.

Robert

On Mar 2, 2015, at 10:33 AM, Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:

I'd like to add that in() is usually a bad idea.  It is convenient, but not 
really what you want in production.  Go with Jens' original suggestion of 
multiple queries.

I recommend reading Ryan Svihla's post on why in() is generally a bad thing: 
http://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

On Mon, Mar 2, 2015 at 12:36 AM Jens Rantil 
mailto:jens.ran...@tink.se>> wrote:
Hi Rahul,

No, you can't do this in a single query. You will need to execute two separate 
queries if the requirements are on different columns. However, if you'd like to 
select multiple rows of with restriction on the same column you can do that 
using the `IN` construct:

select * from table where id IN (123,124);

See [1] for reference.

[1] 
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Cheers,
Jens

On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava 
mailto:srivastava.robi...@gmail.com>> wrote:
Hi
 I want to make uniqueness for my data so i need to add OR clause  in my WHERE 
clause.
ex: select * from table where id =123 OR name ='abc'
so in above i want that i get data if my id is 123 or my name is abc .

is there any possibility in cassandra to achieve this .




--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook 
Linkedin
 Twitter



Re: Running Cassandra on mixed OS

2015-03-02 Thread Jonathan Haddad
I would really not recommend this.  There's enough issues that can come up
with a distributed database that can make it hard to pinpoint problems.

In an ideal world, every machine would be completely identical.  Don't set
yourself up for fail.  Pin the OS & all packages to specific versions.

On Mon, Mar 2, 2015 at 6:44 AM  wrote:

>  Cassandra 1.2.13+/2.0.12
>
>
>
> Have any of you run a single Cassandra cluster on a mix of OS (Red Hat 5
> and 6, for example), but with the same JVM? Any issues or concerns? If
> there are problems, how do you handle OS upgrades?
>
>
>
>
>
>
>
> Sean R. Durity
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: Running Cassandra on mixed OS

2015-03-02 Thread SEAN_R_DURITY
This is not for the long haul, but in order to accomplish an OS upgrade across 
the cluster, without taking an outage.

Sean Durity

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Monday, March 02, 2015 1:15 PM
To: user@cassandra.apache.org
Subject: Re: Running Cassandra on mixed OS

I would really not recommend this.  There's enough issues that can come up with 
a distributed database that can make it hard to pinpoint problems.

In an ideal world, every machine would be completely identical.  Don't set 
yourself up for fail.  Pin the OS & all packages to specific versions.

On Mon, Mar 2, 2015 at 6:44 AM 
mailto:sean_r_dur...@homedepot.com>> wrote:
Cassandra 1.2.13+/2.0.12

Have any of you run a single Cassandra cluster on a mix of OS (Red Hat 5 and 6, 
for example), but with the same JVM? Any issues or concerns? If there are 
problems, how do you handle OS upgrades?



Sean R. Durity



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Node stuck in joining the ring

2015-03-02 Thread Nate McCall
Can you verify that casssandra-rackdc.properties and
cassandra-topology.properties are the same on the cluster?

On Thu, Feb 26, 2015 at 7:52 AM, Batranut Bogdan  wrote:

> No errors in the system.log file
> [root@cassa09 cassandra]# grep "ERROR" system.log
> [root@cassa09 cassandra]#
>
> Nothing.
>
>
>   On Thursday, February 26, 2015 1:55 PM, mck  wrote:
>
>
> Any errors in your log file?
>
> We saw something similar when bootstrap crashed when rebuilding
> secondary indexes.
>
> See CASSANDRA-8798
>
> ~mck
>
>
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Running Cassandra on mixed OS

2015-03-02 Thread Robert Coli
On Mon, Mar 2, 2015 at 6:43 AM,  wrote:

>  Have any of you run a single Cassandra cluster on a mix of OS (Red Hat 5
> and 6, for example), but with the same JVM? Any issues or concerns? If
> there are problems, how do you handle OS upgrades?
>

If you are running the same version of Cassandra in both cases, you are
probably fine. As you point out, one must inevitably upgrade one's OS; I
recently went from ubuntu 1004 to 1204 (with associated (1.6/1.7) JVMs)
without any problems.

But you should of course do any such activity in QA and staging and let it
burn in for a while before doing so in prod.

=Rob


Re: sstables remain after compaction

2015-03-02 Thread Robert Coli
On Sat, Feb 28, 2015 at 5:39 PM, Jason Wee  wrote:

> Hi Rob, sorry for the late response, festive season here. cassandra
> version is 1.0.8 and thank you, I will read on the READ_STAGE threads.
>

1.0.8 is pretty seriously old in 2015. I would upgrade to at least 1.2.x
(via 1.1.x) ASAP. Your cluster will be much happier, in general.

=Rob


set selinux context for cassandra to talk to website

2015-03-02 Thread Tim Dunphy
Hey all,

 Ok I have a website being powered by Cassandra 2.1.3. And I notice if
selinux is set to off, the site works beautifully! However as soon as I set
selinux to on, I am seeing the following error:

Warning: require_once(/McFrazier/PhpBinaryCql/CqlClient.php): failed to
open stream: Permission denied in
/var/www/jf-ref/includes/classes/class.CQL.php on line 2 Fatal error:
require_once(): Failed opening required
'/McFrazier/PhpBinaryCql/CqlClient.php' (include_path='.:/php/includes') in
/var/www/jf-ref/includes/classes/class.CQL.php on line 2

I'm just wondering how I can get SELinux to allow Cassandra to connect to
the web server?

Thanks
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: does need to disable 'rpc_keepalive' if 'rpc_max_threads' is get larger?

2015-03-02 Thread Robert Coli
On Sun, Mar 1, 2015 at 6:40 PM, pprun  wrote:

> rpc_max_threads is set to 2048 and the 'rpc_server_type' is 'hsha', after
> 2 days running, observed that there's a high I/O activity and the number of
> 'RCP thread' grow to '2048' and VisualVm shows most of them is
> 'waiting'/'sleeping' (color: yellow).
>
> I want to know if I set rpc_keepalive to false, disable it, this will help
> to shrink the idle rpc threads?
>
> I remembered java 8 comes with newWorkStealingPool: number of threads may
> grow and shrink dynamically.
>

What version of Cassandra, and hsha or sync? How many client threads do you
have?

=Rob


Reboot: Read After Write Inconsistent Even On A One Node Cluster

2015-03-02 Thread Dan Kinder
Hey all,

I had been having the same problem as in those older post:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAORswtz+W4Eg2CoYdnEcYYxp9dARWsotaCkyvS5M7+Uo6HT1=a...@mail.gmail.com%3E

To summarize it, on my local box with just one cassandra node I can update
and then select the updated row and get an incorrect response.

My understanding is this may have to do with not having fine-grained enough
timestamp resolution, but regardless I'm wondering: is this actually a bug
or is there any way to mitigate it? It causes sporadic failures in our unit
tests, and having to Sleep() between tests isn't ideal. At least confirming
it's a bug would be nice though.

For those interested, here's a little go program that can reproduce the
issue. When I run it I typically see:
Expected 100 but got: 99
Expected 1000 but got: 999

--- main.go: ---

package main

import (
"fmt"

"github.com/gocql/gocql"
)

func main() {
cf := gocql.NewCluster("localhost")
db, _ := cf.CreateSession()
// Keyspace ut = "update test"
err := db.Query(`CREATE KEYSPACE IF NOT EXISTS ut
WITH REPLICATION = {'class': 'SimpleStrategy',
'replication_factor': 1 }`).Exec()
if err != nil {
panic(err.Error())
}
err = db.Query("CREATE TABLE IF NOT EXISTS ut.test (key text, val text,
PRIMARY KEY(key))").Exec()
if err != nil {panic(err.Error())
   }
err = db.Query("TRUNCATE ut.test").Exec()
if err != nil {
panic(err.Error())

}

err = db.Query("INSERT INTO ut.test (key) VALUES ('foo')").Exec()

if err != nil {

panic(err.Error())

}


for i := 0; i < 1; i++ {

val := fmt.Sprintf("%d", i)

db.Query("UPDATE ut.test SET val = ? WHERE key = 'foo'",
val).Exec()


var result string
db.Query("SELECT val FROM ut.test WHERE key = 'foo'").Scan(&result)
if result != val {
fmt.Printf("Expected %v but got: %v\n", val, result)
}
}

}


best practices for time-series data with massive amounts of records

2015-03-02 Thread Clint Kelly
Hi all,

I am designing an application that will capture time series data where we
expect the number of records per user to potentially be extremely high.  I
am not sure if we will eclipse the max row size of 2B elements, but I
assume that we would not want our application to approach that size anyway.

If we wanted to put all of the interactions in a single row, then I would
make a data model that looks like:

CREATE TABLE events (
  id text,
  event_time timestamp,
  event blob,
  PRIMARY KEY (id, event_time))
WITH CLUSTERING ORDER BY (event_time DESC);

The best practice for breaking up large rows of time series data is, as I
understand it, to put part of the time into the partitioning key (
http://planetcassandra.org/getting-started-with-time-series-data-modeling/):

CREATE TABLE events (
  id text,
  date text, // Could also use year+month here or year+week or something
else
  event_time timestamp,
  event blob,
  PRIMARY KEY ((id, date), event_time))
WITH CLUSTERING ORDER BY (event_time DESC);

The downside of this approach is that we can no longer do a simple
continuous scan to get all of the events for a given user.  Some users may
log lots and lots of interactions every day, while others may interact with
our application infrequently, so I'd like a quick way to get the most
recent interaction for a given user.

Has anyone used different approaches for this problem?

The only thing I can think of is to use the second table schema described
above, but switch to an order-preserving hashing function, and then
manually hash the "id" field.  This is essentially what we would do in
HBase.

Curious if anyone else has any thoughts.

Best regards,
Clint


Re: Less frequent flushing with LCS

2015-03-02 Thread Dan Kinder
I see, thanks for the input. Compression is not enabled at the moment, but
I may try increasing that number regardless.

Also I don't think in-memory tables would work since the dataset is
actually quite large. The pattern is more like a given set of rows will
receive many overwriting updates and then not be touched for a while.

On Fri, Feb 27, 2015 at 2:27 PM, Robert Coli  wrote:

> On Fri, Feb 27, 2015 at 2:01 PM, Dan Kinder  wrote:
>
>> Theoretically sstable_size_in_mb could be causing it to flush (it's at
>> the default 160MB)... though we are flushing well before we hit 160MB. I
>> have not tried changing this but we don't necessarily want all the sstables
>> to be large anyway,
>>
>
> I've always wished that the log message told you *why* the SSTable was
> being flushed, which of the various bounds prompted the flush.
>
> In your case, the size on disk may be under 160MB because compression is
> enabled. I would start by increasing that size.
>
> Datastax DSE has in-memory tables for this use case.
>
> =Rob
>
>


-- 
Dan Kinder
Senior Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


RE: sstables remain after compaction

2015-03-02 Thread SEAN_R_DURITY
In my experience, you do not want to stay on 1.1 very long. 1.08 was very 
stable. 1.1 can get bad in a hurry. 1.2 (with many things moved off-heap) is 
very much better.


Sean Durity – Cassandra Admin, Big Data Team

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Monday, March 02, 2015 2:01 PM
To: user@cassandra.apache.org
Subject: Re: sstables remain after compaction

On Sat, Feb 28, 2015 at 5:39 PM, Jason Wee 
mailto:peich...@gmail.com>> wrote:
Hi Rob, sorry for the late response, festive season here. cassandra version is 
1.0.8 and thank you, I will read on the READ_STAGE threads.

1.0.8 is pretty seriously old in 2015. I would upgrade to at least 1.2.x (via 
1.1.x) ASAP. Your cluster will be much happier, in general.

=Rob




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: What are the factors that affect the release time of each minor version?

2015-03-02 Thread Aleksey Yeschenko
Hi Phil,

Right now there is no explicit scheme for minor releases scheduling.

Eventually we just decide that it’s time for a new release - usually when the 
CHANGES list feels too long - and start the process.

what are the duties to release a version?
Need to build and eventually publish all the artifacts - that process is 
semi-automated. Need to run all the unit/distributed/long/duration tests on the 
tagged sha. Need to go through the whole voting process. All in all about a 
week.

Coincidentally, we’ve been discussing our release policies recently, and minor 
version releases have been discussed as well. It is likely that we’ll switch to 
scheduled minor releases soon (every 2/3/4 weeks or when a major bug gets fixed 
- whatever comes first).

-- 
AY

On February 28, 2015 at 2:49:25 AM, Phil Yang (ud1...@gmail.com) wrote:

Hi all  

As a user of Cassandra, sometimes there are some bugs in my cluster and I  
hope someone can fix them (Of course, if I can fix them myself I'll try to  
contribute my code :) ). For each bug, there is a JIRA ticket to tracking  
it and users can know if the bug is fixed.  

However, there is a lag between this bug being fixed and a new minor  
version being released. Although we can apply the patch of this ticket to  
our online version and build a special snapshot to solve the trouble in our  
clusters or we can use the latest code directly, I think many users still  
want to use an official release with higher reliability and indeed, more  
convenience. In addition, updating more frequently can also reduce the  
trouble causing by unknown bugs. So someone may often ask "When the new  
version with this patch will be released?"  

In my mind, not only the number of issues resolved in each version but also  
the time interval between two versions is not fixed. So may I know what the  
factors that affect the release time of each minor version?  

Furthermore, except a vote in dev@cassandra maillist that I can see, what  
are the duties to release a version? If it is not a heavy work, could we  
make each release more frequently? Or we may make a rule to decide if we  
need release a new version? For example: "If the latest version was  
released two weeks ago, or after the latest version we have already  
resolved 20 issues, we should release a new minor version".  

--  
Thanks,  
Phil Yang  


Re: Less frequent flushing with LCS

2015-03-02 Thread Daniel Chia
Do the tables look like they're being flushed every hour? It seems like the
setting memtable_flush_after_mins which I believe defaults to 60 could also
affect how often your tables are flushed.

Thanks,
Daniel

On Mon, Mar 2, 2015 at 11:49 AM, Dan Kinder  wrote:

> I see, thanks for the input. Compression is not enabled at the moment, but
> I may try increasing that number regardless.
>
> Also I don't think in-memory tables would work since the dataset is
> actually quite large. The pattern is more like a given set of rows will
> receive many overwriting updates and then not be touched for a while.
>
> On Fri, Feb 27, 2015 at 2:27 PM, Robert Coli  wrote:
>
>> On Fri, Feb 27, 2015 at 2:01 PM, Dan Kinder  wrote:
>>
>>> Theoretically sstable_size_in_mb could be causing it to flush (it's at
>>> the default 160MB)... though we are flushing well before we hit 160MB. I
>>> have not tried changing this but we don't necessarily want all the sstables
>>> to be large anyway,
>>>
>>
>> I've always wished that the log message told you *why* the SSTable was
>> being flushed, which of the various bounds prompted the flush.
>>
>> In your case, the size on disk may be under 160MB because compression is
>> enabled. I would start by increasing that size.
>>
>> Datastax DSE has in-memory tables for this use case.
>>
>> =Rob
>>
>>
>
>
> --
> Dan Kinder
> Senior Software Engineer
> Turnitin – www.turnitin.com
> dkin...@turnitin.com
>


Re: Less frequent flushing with LCS

2015-03-02 Thread Dan Kinder
Nope, they flush every 5 to 10 minutes.

On Mon, Mar 2, 2015 at 1:13 PM, Daniel Chia  wrote:

> Do the tables look like they're being flushed every hour? It seems like
> the setting memtable_flush_after_mins which I believe defaults to 60
> could also affect how often your tables are flushed.
>
> Thanks,
> Daniel
>
> On Mon, Mar 2, 2015 at 11:49 AM, Dan Kinder  wrote:
>
>> I see, thanks for the input. Compression is not enabled at the moment,
>> but I may try increasing that number regardless.
>>
>> Also I don't think in-memory tables would work since the dataset is
>> actually quite large. The pattern is more like a given set of rows will
>> receive many overwriting updates and then not be touched for a while.
>>
>> On Fri, Feb 27, 2015 at 2:27 PM, Robert Coli 
>> wrote:
>>
>>> On Fri, Feb 27, 2015 at 2:01 PM, Dan Kinder 
>>> wrote:
>>>
 Theoretically sstable_size_in_mb could be causing it to flush (it's at
 the default 160MB)... though we are flushing well before we hit 160MB. I
 have not tried changing this but we don't necessarily want all the sstables
 to be large anyway,

>>>
>>> I've always wished that the log message told you *why* the SSTable was
>>> being flushed, which of the various bounds prompted the flush.
>>>
>>> In your case, the size on disk may be under 160MB because compression is
>>> enabled. I would start by increasing that size.
>>>
>>> Datastax DSE has in-memory tables for this use case.
>>>
>>> =Rob
>>>
>>>
>>
>>
>> --
>> Dan Kinder
>> Senior Software Engineer
>> Turnitin – www.turnitin.com
>> dkin...@turnitin.com
>>
>
>


-- 
Dan Kinder
Senior Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: Should a node that is bootstrapping be receiving writes in addition to the streams it is receiving?

2015-03-02 Thread Paulo Ricardo Motta Gomes
I'm also facing a similar issue while bootstrapping a replacement node via
-Dreplace_address flag. The node is streaming data from neighbors, but
cfstats shows 0 counts for all metrics of all CFs in the bootstrapping node:

SSTable count: 0
SSTables in each level: [0, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live), bytes: 0
Space used (total), bytes: 0
SSTable Compression Ratio: 0.0
Number of keys (estimate): 0
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 0
Local read count: 0
Local read latency: 0.000 ms
Local write count: 0
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used, bytes: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0

I also checked via JMX and all the write counts are zero. Is the node
supposed to receive writes during bootstrap?

The other funny thing during bootstrap, is that nodetool status shows that
the bootsrapping node is Up/Normal (UN), instead of Up/Joining(UJ), is this
expected or is it a bug? The bootstrapping node does not even appear in the
"nodetool status" of other nodes.

UN  X.Y.Z.244  15.9 GB1   3.7%
52fb21e-4621-4533-b201-8c1a7adbe818  rack

If I do a nodetool netstats, I see:

Mode: JOINING
Bootstrap 647d4b30-c11e-11e4-9249-173e73521fb44

Cheers,

Paulo

On Thu, Oct 16, 2014 at 3:53 PM, Robert Coli  wrote:

> On Wed, Oct 15, 2014 at 10:07 PM, Peter Haggerty <
> peter.hagge...@librato.com> wrote:
>
>> The node wrote gigs of data to various CFs during the bootstrap so it
>> was clearly "writing" in some sense and it has the expected behavior
>> after the bootstrap. Is cfstats correct when it reports that there
>> were no writes during a bootstrap?
>>
>
> As I understand it :
>
> Writes ("extra" writes, from the perspective of replication factor, f/e a
> RF=3 cluster has effective RF=4 during bootstrap, but not relevant for
> consistency purposes until end of bootstrap) occur via the storage protocol
> during bootstrap, so I would expect to see those reflected in cfstats.
>
> I'm relatively confident it is in fact receiving those writes, so your
> confusion might just be a result of how it's reported?
>
> =Rob
> http://twitter.com/rcolidba
>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200


RDD partitions per executor in Cassandra Spark Connector

2015-03-02 Thread Rumph, Frens Jan
Hi all,

I didn't find the *issues* button on
https://github.com/datastax/spark-cassandra-connector/ so posting here.

Any one have an idea why token ranges are grouped into one partition per
executor? I expected at least one per core. Any suggestions on how to work
around this? Doing a repartition is way to expensive as I just want more
partitions for parallelism, not reshuffle ...

Thanks in advance!
Frens Jan


Re: Should a node that is bootstrapping be receiving writes in addition to the streams it is receiving?

2015-03-02 Thread Robert Coli
On Mon, Mar 2, 2015 at 1:58 PM, Paulo Ricardo Motta Gomes <
paulo.mo...@chaordicsystems.com> wrote:

> I also checked via JMX and all the write counts are zero. Is the node
> supposed to receive writes during bootstrap?
>

As I understand it, yes.

The other funny thing during bootstrap, is that nodetool status shows that
> the bootsrapping node is Up/Normal (UN), instead of Up/Joining(UJ), is this
> expected or is it a bug? The bootstrapping node does not even appear in the
> "nodetool status" of other nodes.
>

Perhaps this node is not actually bootstrapping because you have configured
it as a seed with no other valid seeds listed and so it has started as a
cluster of one?

=Rob


Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster

2015-03-02 Thread Robert Coli
On Mon, Mar 2, 2015 at 11:44 AM, Dan Kinder  wrote:

> I had been having the same problem as in those older post:
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAORswtz+W4Eg2CoYdnEcYYxp9dARWsotaCkyvS5M7+Uo6HT1=a...@mail.gmail.com%3E
>

As I said on that thread :

"It sounds unreasonable/unexpected to me, if you have a trivial repro case,
I would file a JIRA."

=Rob


Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster

2015-03-02 Thread Dan Kinder
Done: https://issues.apache.org/jira/browse/CASSANDRA-8892

On Mon, Mar 2, 2015 at 3:26 PM, Robert Coli  wrote:

> On Mon, Mar 2, 2015 at 11:44 AM, Dan Kinder  wrote:
>
>> I had been having the same problem as in those older post:
>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAORswtz+W4Eg2CoYdnEcYYxp9dARWsotaCkyvS5M7+Uo6HT1=a...@mail.gmail.com%3E
>>
>
> As I said on that thread :
>
> "It sounds unreasonable/unexpected to me, if you have a trivial repro
> case, I would file a JIRA."
>
> =Rob
>
>


-- 
Dan Kinder
Senior Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster

2015-03-02 Thread Peter Sanford
Hmm. I was able to reproduce the behavior with your go program on my dev
machine (C* 2.0.12). I was hoping it was going to just be an unchecked
error from the .Exec() or .Scan(), but that is not the case for me.

The fact that the issue seems to happen on loop iteration 10, 100 and 1000
is pretty suspicious. I took a tcpdump to confirm that the gocql was in
fact sending the "write 100" query and then on the next read Cassandra
responded with "99".

I'll be interested to see what the result of the jira ticket is.

-psanford


Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster

2015-03-02 Thread Dan Kinder
Yeah I thought that was suspicious too, it's mysterious and fairly
consistent. (By the way I had error checking but removed it for email
brevity, but thanks for verifying :) )

On Mon, Mar 2, 2015 at 4:13 PM, Peter Sanford 
wrote:

> Hmm. I was able to reproduce the behavior with your go program on my dev
> machine (C* 2.0.12). I was hoping it was going to just be an unchecked
> error from the .Exec() or .Scan(), but that is not the case for me.
>
> The fact that the issue seems to happen on loop iteration 10, 100 and 1000
> is pretty suspicious. I took a tcpdump to confirm that the gocql was in
> fact sending the "write 100" query and then on the next read Cassandra
> responded with "99".
>
> I'll be interested to see what the result of the jira ticket is.
>
> -psanford
>
>


-- 
Dan Kinder
Senior Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: Reboot: Read After Write Inconsistent Even On A One Node Cluster

2015-03-02 Thread Peter Sanford
The more I think about it, the more this feels like a column timestamp
issue. If two inserts have the same timestamp then the values are compared
lexically to decide which one to keep (which I think explains the
"99"/"100" "999"/"1000" mystery).

We can verify this by also selecting out the WRITETIME of the column:

...
var prevTS int
for i := 0; i < 1; i++ {
val := fmt.Sprintf("%d", i)
db.Query("UPDATE ut.test SET val = ? WHERE key = 'foo'", val).Exec()

var result string
var ts int
db.Query("SELECT val, WRITETIME(val) FROM ut.test WHERE key =
'foo'").Scan(&result, &ts)
if result != val {
fmt.Printf("Expected %v but got: %v; (prevTS:%d, ts:%d)\n", val, result,
prevTS, ts)
}
prevTS = ts
}


When I run it with this change I see that the timestamps are in fact the
same:

Expected 10 but got: 9; (prevTS:1425345839903000, ts:1425345839903000)
Expected 100 but got: 99; (prevTS:1425345839939000, ts:1425345839939000)
Expected 101 but got: 99; (prevTS:1425345839939000, ts:1425345839939000)
Expected 1000 but got: 999; (prevTS:1425345840296000, ts:1425345840296000)


It looks like we're only getting millisecond precision instead of
microsecond for the column timestamps?! If you explicitly set the timestamp
value when you do the insert, you can get actual microsecond precision and
the issue should go away.

-psanford

On Mon, Mar 2, 2015 at 4:21 PM, Dan Kinder  wrote:

> Yeah I thought that was suspicious too, it's mysterious and fairly
> consistent. (By the way I had error checking but removed it for email
> brevity, but thanks for verifying :) )
>
> On Mon, Mar 2, 2015 at 4:13 PM, Peter Sanford 
> wrote:
>
>> Hmm. I was able to reproduce the behavior with your go program on my dev
>> machine (C* 2.0.12). I was hoping it was going to just be an unchecked
>> error from the .Exec() or .Scan(), but that is not the case for me.
>>
>> The fact that the issue seems to happen on loop iteration 10, 100 and
>> 1000 is pretty suspicious. I took a tcpdump to confirm that the gocql was
>> in fact sending the "write 100" query and then on the next read Cassandra
>> responded with "99".
>>
>> I'll be interested to see what the result of the jira ticket is.
>>
>> -psanford
>>
>>
>
>
> --
> Dan Kinder
> Senior Software Engineer
> Turnitin – www.turnitin.com
> dkin...@turnitin.com
>


Re: Node stuck in joining the ring

2015-03-02 Thread Phil Yang
I encountered a similar situation that streaming can not finish, not only
in joining but in removing a node. My tricky solution is: restart every
node in the cluster before you starting the new node. In my experience
streaming stucked only shows in the node that have been running many days
although I have no idea about the reason.

2015-03-03 2:42 GMT+08:00 Nate McCall :

> Can you verify that casssandra-rackdc.properties and
> cassandra-topology.properties are the same on the cluster?
>
> On Thu, Feb 26, 2015 at 7:52 AM, Batranut Bogdan 
> wrote:
>
>> No errors in the system.log file
>> [root@cassa09 cassandra]# grep "ERROR" system.log
>> [root@cassa09 cassandra]#
>>
>> Nothing.
>>
>>
>>   On Thursday, February 26, 2015 1:55 PM, mck  wrote:
>>
>>
>> Any errors in your log file?
>>
>> We saw something similar when bootstrap crashed when rebuilding
>> secondary indexes.
>>
>> See CASSANDRA-8798
>>
>> ~mck
>>
>>
>>
>>
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Thanks,
Phil Yang