Re: Adding New Nodes/Data Center to an existing Cluster.

2015-09-04 Thread Sachin Nikam
Neha/Sebastian,
Sorry for the typo. We use DSE 4.7 which ships with Cassandra 2.1
Regards
Sachin

On Tue, Sep 1, 2015 at 10:04 PM, Neha Trivedi 
wrote:

> Sachin,
> Hope you are not using Cassandra 2.2 in production?
> regards
> Neha
>
> On Tue, Sep 1, 2015 at 11:20 PM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> DSE 4.7 ships with Cassandra 2.1 for stability.
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Sep 1, 2015 at 12:53 PM, Sachin Nikam  wrote:
>>
>>> @Neha,
>>> We are using DSE 4.7 & Cassandra 2.2
>>>
>>> @Alain,
>>> I will check with out OPS team about repair vs rebuild and get back to
>>> you.
>>> Regards
>>> Sachin
>>>
>>> On Tue, Sep 1, 2015 at 5:59 AM, Alain RODRIGUEZ 
>>> wrote:
>>>
 Hi Sachin,

 You are speaking about a repair, when the proper command to do this is
 "rebuild" ?

 Did you tried adding your DC this way:
 http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
  ?


 2015-09-01 5:32 GMT+02:00 Neha Trivedi :

> Hi,
> Can you specify which version of Cassandra you are using?
> Can you provide the Error Stack ?
>
> regards
> Neha
>
> On Tue, Sep 1, 2015 at 2:56 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> or https://issues.apache.org/jira/browse/CASSANDRA-8611 perhaps
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image:
>> twitter.png]  [image: g+.png]
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database
>> technology, delivering Apache Cassandra to the world’s most innovative
>> enterprises. Datastax is built to be agile, always-on, and predictably
>> scalable to any size. With more than 500 customers in 45 countries, 
>> DataStax
>> is the database technology and transactional backbone of choice for the
>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>> eBay.
>>
>> On Mon, Aug 31, 2015 at 5:24 PM, Eric Evans 
>> wrote:
>>
>>>
>>> On Mon, Aug 31, 2015 at 1:32 PM, Sachin Nikam 
>>> wrote:
>>>
 When we add 3 more nodes in Data Center B, the repair tool starts
 syncing the data between 2 data centers and then gives up after ~2 
 days.

 Has anybody run in to similar issue before? If so what is the
 solution?

>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-9624, maybe?
>>>
>>>
>>> --
>>> Eric Evans
>>> eev...@wikimedia.org
>>>
>>
>>
>

>>>
>>
>


SELECT DISTINCT and rows with only static columns

2015-09-04 Thread Kay Johansen
I have a columnfamily with static columns and non-static columns. When I
have a row with values only in the static columns, select distinct will
return that row, whereas select will not.

For example:

cqlsh> select distinct id from members where id = 'zyto-8c0db';

 plan_id

 zyto-8c0db

(1 rows)

cqlsh> select id from members where  id = 'zyto-8c0db';

(0 rows)


Now that I know this I can plan for this, but I'm wondering why this is the
case so I can understand the underlying mechanism.

Using Cassandra 2.0.11.83.

Thanks in advance for any insight,
-Kay

-- 
Kay Johansen, Software Craftsman
Pluralsight | democratizing professional training
(801) 784-9007


Re: Order By limitation or bug?

2015-09-04 Thread Tyler Hobbs
This query would be reasonable to support, so I've opened
https://issues.apache.org/jira/browse/CASSANDRA-10271 to fix that.

On Thu, Sep 3, 2015 at 7:48 PM, Alec Collier 
wrote:

> You should be able to execute the following
>
>
>
> SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY
> type, id DESC;
>
>
>
> Essentially the order by clause has to specify the clustering columns in
> order in full. It doesn’t by default know that you have already essentially
> filtered by type.
>
>
>
> *Alec Collier* | Workplace Service Design
>
> Corporate Operations Group - Technology | Macquarie Group Limited £
>
>
>
> *From:* Robert Wille [mailto:rwi...@fold3.com]
> *Sent:* Friday, 4 September 2015 7:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Order By limitation or bug?
>
>
>
> If you only specify the partition key, and none of the clustering columns,
> you can order by in either direction:
>
>
>
> SELECT data FROM import_file WHERE roll = 1 order by type;
>
> SELECT data FROM import_file WHERE roll = 1 order by type DESC;
>
>
>
> These are both valid. Seems like specifying the prefix of the clustering
> columns is just a specialization of an already-supported pattern.
>
>
>
> Robert
>
>
>
> On Sep 3, 2015, at 2:46 PM, DuyHai Doan  wrote:
>
>
>
> Limitation, not bug. The reason ?
>
>
>
> On disk, data are sorted by type first, and FOR EACH type value, the data
> are sorted by id.
>
>
>
> So to do an order by Id, C* will need to perform an in-memory re-ordering,
> not sure how bad it is for performance. In any case currently it's not
> possible, maybe you should create a JIRA to ask for lifting the limitation.
>
>
>
> On Thu, Sep 3, 2015 at 10:27 PM, Robert Wille  wrote:
>
> Given this table:
>
>
>
> CREATE TABLE import_file (
>
>   roll int,
>
>   type text,
>
>   id timeuuid,
>
>   data text,
>
>   PRIMARY KEY ((roll), type, id)
>
> )
>
>
>
> This should be possible:
>
>
>
> SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY id
> DESC;
>
>
>
> but it results in the following error:
>
>
>
> Bad Request: Order by currently only support the ordering of columns
> following their declared order in the PRIMARY KEY
>
>
>
> I am ordering in the declared order in the primary key. I don’t see why
> this shouldn’t be able to be supported. Is this a known limitation or a bug?
>
>
>
> In this example, I can get the results I want by omitting the ORDER BY
> clause and adding WITH CLUSTERING ORDER BY (id DESC) to the schema.
> However, now I can only get descending order. I have to choose either
> ascending or descending order. I cannot get both.
>
>
>
> Robert
>
>
>
>
>
>
>
> This email, including any attachments, is confidential. If you are not the
> intended recipient, you must not disclose, distribute or use the
> information in this email in any way. If you received this email in error,
> please notify the sender immediately by return email and delete the
> message. Unless expressly stated otherwise, the information in this email
> should not be regarded as an offer to sell or as a solicitation of an offer
> to buy any financial product or service, an official confirmation of any
> transaction, or as an official statement of the entity sending this
> message. Neither Macquarie Group Limited, nor any of its subsidiaries,
> guarantee the integrity of any emails or attached files and are not
> responsible for any changes made to them by any other person.
>



-- 
Tyler Hobbs
DataStax 


Convert joins in RDBMS to Cassandra

2015-09-04 Thread srinivas s
   1. Hi All,



   1. I am trying to model  RDBMS joins into cassandra. As I am new to
   cassandra, I need your help/suggestion on this.  Below is the information
   regarding the query:
   2.
   3. I have a query in RDBMS as follows:
   4.
   5. select t3.name from  Table1 t1, Table2 t2, Table3 t3, Table4 t4 where
   6. t2.cust_id = 3 and t4.sid = t1.sid and t1.colid = t2.colid  and
   t4.cid = t3.cid
   7.
   8.
   9. Now, trying to make a shimilar query in cassandra:
   10.
   11. As per my learning experience in Cassandra, I got the below 2
   solutions: (as cassandra does not support joins)
   12.
   13. Solution 1:*
   14.
   15. 1) Fetch all the records with t2.cust_id = 3
   16. 2) Now again run another query that will do the condition t3.sid =
   t1.sid on the results returned from point 1.
   17. 3) continue the same for all the conditions.
   18.
   19. Drawbacks with this approach:
   20.
   21. For each join, I have to do a network call to fetch the details.
   Also, it will take more time..as I am running multiple conditions
   22.
   23.
   24. Solution 2: *
   25.
   26. 1) Create a map table for every possible join.
   27.
   28. Drawbacks with this aproach:
   29.
   30. I think, this is not a right approach. So join to table (map table)
   mapping idea is not right.
   31.
   32. pastebin link for the same: http://pastebin.com/FRAyihPT
   33. Please suggest me on this.


Re: who does generate timestamp during the write?

2015-09-04 Thread Tyler Hobbs
Timestamps can come from three different places, in order of precedence
from highest to lowest:
* The CQL query itself through the "USING TIMESTAMP" clause
* The driver (or maybe application) at the protocol level when using the v3
native protocol or higher (which is available in Cassandra 2.1+).  This is
what I recommend using in most cases, because the driver can safely retry
idempotent writes.
* The coordinator node

On Fri, Sep 4, 2015 at 1:06 PM, Andrey Ilinykh  wrote:

> I meant thrift based api. If we are talking about CQL then timestamps are
> generated by node you are connected to. This is a "client".
>
> On Fri, Sep 4, 2015 at 10:49 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Hi Andrey,
>>
>> I just came across this articale "
>>
>> "Each cell in a CQL table has a corresponding timestamp
>> which is taken from the clock on *the Cassandra node* *that orchestrates the
>> write.* When you are reading from a Cassandra cluster the node that
>> coordinates the read will compare the timestamps of the values it fetches.
>> Last write(=highest timestamp) wins and will be returned to the client."
>>
>> What do you think?
>>
>> "
>>
>> On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh 
>> wrote:
>>
>>> Coordinator doesn't generate timestamp, it is generated by client.
>>>
>>> On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi <
>>> ibrahimsaba...@gmail.com> wrote:
>>>
 Ok, why coordinator does generate timesamp, as the write is a part of
 Cassandra process after client submit the request to Cassandra?

 On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh 
 wrote:

> Your application.
>
> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Dear folks,
>>
>> When we hear about the notion of Last-Write-Wins in Cassandra
>> according to timestamp, *who does generate this timestamp during the
>> write, coordinator or each individual replica in which the write is going
>> to be stored?*
>>
>>
>> *Regards,*
>>
>>
>>
>> *Ibrahim*
>>
>
>

>>>
>>
>


-- 
Tyler Hobbs
DataStax 


Old SSTables lying around

2015-09-04 Thread Vidur Malik
Hey,

We're running a Cassandra 2.2.0 cluster with 8 nodes. We are doing frequent
updates to our data and we have very few reads, and we are using Leveled
Compaction with a sstable_size_in_mb of 160MB. We don't have that much data
currently since we're just testing the cluster.
We are seeing the SSTable count linearly increase even though `nodetool
compactionhistory` shows that compaction have definitely ran. When I ran
nodetool cfstats, I get the following output:

Table: tender_summaries

SSTable count: 56

SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]

Does it make sense that there is such a huge difference between the number
of SStables in each level and the total count of SStables? It seems like
old SSTables are lying around and never cleaned-up/compacted. Does this
theory sound plausible? If so, what could be the problem?

Thanks!


Re: Why I can not do a "count(*) ... allow filtering " without facing operation timeout?

2015-09-04 Thread Sebastian Estevez
I hope this is not a production query...

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]





DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Sep 4, 2015 at 4:34 AM, Tommy Stendahl 
wrote:

> Hi,
>
> Checkout CASSANDRA-8899, my guess is that you have to increase the timeout
> in cqlsh.
>
> /Tommy
>
>
> On 2015-09-04 10:31, shahab wrote:
>
> Hi,
>
> This is probably a silly problem , but it is really serious for me. I have
> a cluster of 3 nodes, with replication factor 2. But still I can not do a
> simple "select count(*) from ..."  neither using DevCenter nor "cqlsh" .
> Any idea how this can be done?
>
> best,
> /Shahab
>
>
>


tablesnap / tableslurp usage pointers?

2015-09-04 Thread Maciek Sakrejda
Hi,

I'm trying to use tablesnap [1] for disaster recovery backups, and while my
uploads seem to be working fine, I can't figure out how to run the
associated tableslurp tool for restores. If I pass the full S3 path to the
individual table to tableslurp, it will restore that table, but if I try to
pass the path to, e.g., the full keyspace, I get:

LookupError: Cannot find anything to restore from
my-bucket:my-prefix:/my-path

Based on the source [2], it seems to be only looking for `-listdir.json`
files in the same directory, but my directories in S3 only have
`-listdir.json` files for other *files*, not directories. I am running
tablesnap with the `--recursive` option.

Any ideas?

Thanks,
Maciek

[1]: https://github.com/JeremyGrosser/tablesnap
[2]:
https://github.com/JeremyGrosser/tablesnap/blob/master/tableslurp#L122-L124


Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh
I meant thrift based api. If we are talking about CQL then timestamps are
generated by node you are connected to. This is a "client".

On Fri, Sep 4, 2015 at 10:49 AM, ibrahim El-sanosi  wrote:

> Hi Andrey,
>
> I just came across this articale "
>
> "Each cell in a CQL table has a corresponding timestamp
> which is taken from the clock on *the Cassandra node* *that orchestrates the
> write.* When you are reading from a Cassandra cluster the node that
> coordinates the read will compare the timestamps of the values it fetches.
> Last write(=highest timestamp) wins and will be returned to the client."
>
> What do you think?
>
> "
>
> On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh  wrote:
>
>> Coordinator doesn't generate timestamp, it is generated by client.
>>
>> On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi <
>> ibrahimsaba...@gmail.com> wrote:
>>
>>> Ok, why coordinator does generate timesamp, as the write is a part of
>>> Cassandra process after client submit the request to Cassandra?
>>>
>>> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh 
>>> wrote:
>>>
 Your application.

 On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
 ibrahimsaba...@gmail.com> wrote:

> Dear folks,
>
> When we hear about the notion of Last-Write-Wins in Cassandra
> according to timestamp, *who does generate this timestamp during the
> write, coordinator or each individual replica in which the write is going
> to be stored?*
>
>
> *Regards,*
>
>
>
> *Ibrahim*
>


>>>
>>
>


Re: who does generate timestamp during the write?

2015-09-04 Thread ibrahim El-sanosi
Hi Andrey,

I just came across this articale "

"Each cell in a CQL table has a corresponding timestamp
which is taken from the clock on *the Cassandra node* *that orchestrates the
write.* When you are reading from a Cassandra cluster the node that
coordinates the read will compare the timestamps of the values it fetches.
Last write(=highest timestamp) wins and will be returned to the client."

What do you think?

"

On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh  wrote:

> Coordinator doesn't generate timestamp, it is generated by client.
>
> On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Ok, why coordinator does generate timesamp, as the write is a part of
>> Cassandra process after client submit the request to Cassandra?
>>
>> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh 
>> wrote:
>>
>>> Your application.
>>>
>>> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
>>> ibrahimsaba...@gmail.com> wrote:
>>>
 Dear folks,

 When we hear about the notion of Last-Write-Wins in Cassandra according
 to timestamp, *who does generate this timestamp during the write,
 coordinator or each individual replica in which the write is going to be
 stored?*


 *Regards,*



 *Ibrahim*

>>>
>>>
>>
>


Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh
Coordinator doesn't generate timestamp, it is generated by client.

On Fri, Sep 4, 2015 at 10:37 AM, ibrahim El-sanosi  wrote:

> Ok, why coordinator does generate timesamp, as the write is a part of
> Cassandra process after client submit the request to Cassandra?
>
> On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh  wrote:
>
>> Your application.
>>
>> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
>> ibrahimsaba...@gmail.com> wrote:
>>
>>> Dear folks,
>>>
>>> When we hear about the notion of Last-Write-Wins in Cassandra according
>>> to timestamp, *who does generate this timestamp during the write,
>>> coordinator or each individual replica in which the write is going to be
>>> stored?*
>>>
>>>
>>> *Regards,*
>>>
>>>
>>>
>>> *Ibrahim*
>>>
>>
>>
>


Re: who does generate timestamp during the write?

2015-09-04 Thread ibrahim El-sanosi
Ok, why coordinator does generate timesamp, as the write is a part of
Cassandra process after client submit the request to Cassandra?

On Fri, Sep 4, 2015 at 6:29 PM, Andrey Ilinykh  wrote:

> Your application.
>
> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi <
> ibrahimsaba...@gmail.com> wrote:
>
>> Dear folks,
>>
>> When we hear about the notion of Last-Write-Wins in Cassandra according
>> to timestamp, *who does generate this timestamp during the write,
>> coordinator or each individual replica in which the write is going to be
>> stored?*
>>
>>
>> *Regards,*
>>
>>
>>
>> *Ibrahim*
>>
>
>


Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh
Your application.

On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi  wrote:

> Dear folks,
>
> When we hear about the notion of Last-Write-Wins in Cassandra according to
> timestamp, *who does generate this timestamp during the write,
> coordinator or each individual replica in which the write is going to be
> stored?*
>
>
> *Regards,*
>
>
>
> *Ibrahim*
>


who does generate timestamp during the write?

2015-09-04 Thread ibrahim El-sanosi
Dear folks,

When we hear about the notion of Last-Write-Wins in Cassandra according to
timestamp, *who does generate this timestamp during the write, coordinator
or each individual replica in which the write is going to be stored?*


*Regards,*



*Ibrahim*


Re: Repair documentation

2015-09-04 Thread Marcus Olsson

It was a typo in the first test, it should be

/repair -h 127.0.0.1 -p 7100 repair -pr -hosts 127.0.0.2 127.0.0.3

/and not

/repair -h 127.0.0.1 -p 7100 repair -pr 127.0.0.2 127.0.0.3/
/
/BR
Marcus Olsson

On 09/04/2015 02:50 PM, Marcus Olsson wrote:

Hi,

While checking the repair documentation at 
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html 
I noticed the line***Use the **-hosts**option to list the good nodes 
to use for repairing the bad nodes. Use **-h**to name the bad nodes.* 
and below there was an example:


*/nodetool repair -pr -hosts 
/**/10.2/**/./**/2.20/**//**/10.2/**/./**/2.21/* which should do */A 
partitioner range repair of the bad partition on current node using 
the good partitions on 10.2.2.20 or 10.2.2.21/* according to the 
documentation.


Is this correctly documented because I don't seem to be getting the 
right results when trying.


I started up a C* 2.1.9 CCM cluster and when running
/
repair -h 127.0.0.1 -p 7100 repair -pr 127.0.0.2 127.0.0.3/

I get the error:

*nodetool: Keyspace [127.0.0.3] does not exist.*

---

When I run it as
/
nodetool -h 127.0.0.1 -p 7100 repair -pr -hosts 127.0.0.2/

instead it gives me the error:
*java.lang.RuntimeException: Primary range repair should be performed 
on all nodes in the cluster.**
**at 
org.apache.cassandra.tools.NodeTool$Repair.execute(NodeTool.java:1873)**
**at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:288)**

**at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)*

---

I even tried running it as
/
repair -h 127.0.0.1 -p 7100 repair -hosts 127.0.0.2
/
and then I get*
**The current host must be part of the repair*

---

This seems like either bug(s) or a documentation mistake?

There is also a line in 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html 
which says that *You can specify which nodes have the good data for 
replacing the outdated data.* which seems to be related(and also the 
reason I tried it out)?


BR
Marcus Olsson




Repair documentation

2015-09-04 Thread Marcus Olsson

Hi,

While checking the repair documentation at 
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html 
I noticed the line***Use the **-hosts**option to list the good nodes to 
use for repairing the bad nodes. Use **-h**to name the bad nodes.* and 
below there was an example:


*/nodetool repair -pr -hosts 
/**/10.2/**/./**/2.20/**//**/10.2/**/./**/2.21/* which should do */A 
partitioner range repair of the bad partition on current node using the 
good partitions on 10.2.2.20 or 10.2.2.21/* according to the documentation.


Is this correctly documented because I don't seem to be getting the 
right results when trying.


I started up a C* 2.1.9 CCM cluster and when running
/
repair -h 127.0.0.1 -p 7100 repair -pr 127.0.0.2 127.0.0.3/

I get the error:

*nodetool: Keyspace [127.0.0.3] does not exist.*

---

When I run it as
/
nodetool -h 127.0.0.1 -p 7100 repair -pr -hosts 127.0.0.2/

instead it gives me the error:
*java.lang.RuntimeException: Primary range repair should be performed on 
all nodes in the cluster.**
**at 
org.apache.cassandra.tools.NodeTool$Repair.execute(NodeTool.java:1873)**
**at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:288)**

**at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)*

---

I even tried running it as
/
repair -h 127.0.0.1 -p 7100 repair -hosts 127.0.0.2
/
and then I get*
**The current host must be part of the repair*

---

This seems like either bug(s) or a documentation mistake?

There is also a line in 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html 
which says that *You can specify which nodes have the good data for 
replacing the outdated data.* which seems to be related(and also the 
reason I tried it out)?


BR
Marcus Olsson


Re: Incremental repair from the get go

2015-09-04 Thread Marcus Eriksson
Starting up fresh it is totally OK to just start using incremental repairs

On Thu, Sep 3, 2015 at 10:25 PM, Jean-Francois Gosselin <
jfgosse...@gmail.com> wrote:

>
> On fresh install of Cassandra what's the best approach to start using
> incremental repair from the get go (I'm using LCS) ?
>
> Run nodetool repair -inc after inserting a few rows , or we still need to
> follow the migration procedure with sstablerepairedset ?
>
> From the documentation "... If you use the leveled compaction strategy
> and perform an incremental repair for the first time, Cassandra performs
> size-tiering on all SSTables because the repair/unrepaired status is
> unknown. This operation can take a long time. To save time, migrate to
> incremental repair one node at a time. ..."
>
> With almost no data size-tiering should be quick ?  Basically is there a
> short cut to avoid the migration via sstablerepairedset  on a fresh install
> ?
>
> Thanks
>
> JF
>


Re: Data Size on each node

2015-09-04 Thread Alprema
Hi,

I agree with Alain, we have the same kind of problem here (4 DCs, ~1TB /
node) and we are replacing our big servers full of spinning drives with a
bigger number of smaller servers with SSDs (microservers are quite
efficient in terms of rack space and cost).

Kévin

On Tue, Sep 1, 2015 at 1:11 PM, Alain RODRIGUEZ  wrote:

> Hi,
>
> Our migration to SSD (from m1.xl to I2.2xl on AWS) has been a big win. I
> mean we wen from 80 / 90 % disk utilisation to 20 % max. Basically,
> bottleneck are not disks performances anymore in our case. We get rid of
> one of our major issue that was disk contention.
>
> I highly recommend you to go ahead with this, even more with such a big
> data set. Yet it will probably be more expensive per node.
>
> An other solution for you might be adding nodes (to have less to handle
> per node and make maintenance operations like repair, bootstrap,
> decommission, ... faster)
>
> C*heers,
>
> Alain
>
>
>
>
> 2015-09-01 10:17 GMT+02:00 Sachin Nikam :
>
>> We currently have a Cassandra Cluster spread over 2 DC. The data size on
>> each node of the cluster is 1.2TB with spinning disk. Minor and Major
>> compactions are slowing down our Read queries. It has been suggested that
>> replacing Spinning disks with SSD might help. Has anybody done something
>> similar? If so what has been the results?
>> Also if we go with SSD, how big can each node get for commercially
>> available SSDs?
>> Regards
>> Sachin
>>
>
>


Re: Why I can not do a "count(*) ... allow filtering " without facing operation timeout?

2015-09-04 Thread Tommy Stendahl

Hi,

Checkout CASSANDRA-8899, my guess is that you have to increase the 
timeout in cqlsh.


/Tommy

On 2015-09-04 10:31, shahab wrote:

Hi,

This is probably a silly problem , but it is really serious for me. I 
have a cluster of 3 nodes, with replication factor 2. But still I can 
not do a simple "select count(*) from ..."  neither using DevCenter 
nor "cqlsh" . Any idea how this can be done?


best,
/Shahab




Why I can not do a "count(*) ... allow filtering " without facing operation timeout?

2015-09-04 Thread shahab
Hi,

This is probably a silly problem , but it is really serious for me. I have
a cluster of 3 nodes, with replication factor 2. But still I can not do a
simple "select count(*) from ..."  neither using DevCenter nor "cqlsh" .
Any idea how this can be done?

best,
/Shahab


Re: Cassandra 2.2 for time series

2015-09-04 Thread Kévin LOVATO
This solution is very "sql-like", meaning that you query what you want when
you need it.
Unfortunately this will probably not scale as your data grows, you might
want to consider de-normalizing your data. You could maintain a min/max
average in the application that inserts the data, or have a batch that runs
periodically to precompute the data.

On Thu, Sep 3, 2015 at 6:59 AM, Kevin Burton  wrote:

> Check out kairosd for a time series db on Cassandra.
> On Aug 31, 2015 7:12 AM, "Peter Lin"  wrote:
>
>>
>> I didn't realize they had added max and min as stock functions.
>>
>> to get the sample time. you'll probably need to write a custom function.
>> google for it and you'll find people that have done it.
>>
>> On Mon, Aug 31, 2015 at 10:09 AM, Pål Andreassen <
>> pal.andreas...@bouvet.no> wrote:
>>
>>> Cassandra 2.2 has min and max built-in. My problem is getting the
>>> corresponding sample time as well.
>>>
>>>
>>>
>>> *Pål Andreassen*
>>>
>>> *54°23'58"S 3°18'53"E*
>>>
>>> *Konsulent*
>>>
>>> Mobil +47 982 85 504
>>>
>>> pal.andreas...@bouvet.no
>>>
>>>
>>>
>>>
>>> *Bouvet Norge AS Avdeling Grenland*
>>>
>>> Uniongata 18, Klosterøya
>>>
>>> N-3732 Skien
>>>
>>> Tlf +47 23 40 60 00
>>>
>>> *bouvet.no*
>>> 
>>>
>>>
>>>
>>> *From:* Peter Lin [mailto:wool...@gmail.com]
>>> *Sent:* mandag 31. august 2015 16.09
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Cassandra 2.2 for time series
>>>
>>>
>>>
>>>
>>>
>>> Unlike SQL, CQL doesn't have built-in functions like max/min
>>>
>>> In the past, people would create summary tables to keep rolling stats
>>> for reports/analytics. In cql3, there's user defined functions, so you can
>>> write a function to do max/min
>>>
>>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
>>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen <
>>> pal.andreas...@bouvet.no> wrote:
>>>
>>> Hi
>>>
>>>
>>>
>>> I’m currently evaluating Cassandra as a potiantial database for storing
>>> time series data from lots of devices (IoT type of scenario).
>>>
>>> Currently we have a few thousand devices with X channels (measurements)
>>> that they report at different intervals (from 5 minutes and up).
>>>
>>>
>>>
>>> I’ve created as simple test table to store the data:
>>>
>>>
>>>
>>> CREATE TABLE DataRaw(
>>>
>>>   channelId int,
>>>
>>>   sampleTime timestamp,
>>>
>>>   value double,
>>>
>>>   PRIMARY KEY (channelId, sampleTime)
>>>
>>> ) WITH CLUSTERING ORDER BY (sampleTime ASC);
>>>
>>>
>>>
>>> This schema seems to work ok, but I have queries that I need to support
>>> that I cannot easily figure out how to perform (except getting all the data
>>> out and iterate it myself).
>>>
>>>
>>>
>>> Query 1: For max and min queries, I not only want the maximum/minimum
>>> value, but also the corresponding timestamp.
>>>
>>>
>>>
>>> sampleTime  value
>>>
>>> 2015-08-28 00:0010
>>>
>>> 2015-08-28 01:0015
>>>
>>> 2015-08-28 02:0013
>>>
>>>
>>> I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT
>>> sampleTime, max(value) FROM DataRAW return the max value, but the first
>>> sampleTime.
>>>
>>> Also I wonder if Cassandra has built-in support for
>>> interpolation/extrapolation. Some sort of group by hour/day/week/month and
>>> even year function.
>>>
>>>
>>>
>>> Query 2: Give me hourly averages for channel X for yesterday. I’d expect
>>> to get 24 values each of which is the hourly average. Or give my daily
>>> averages for last year for a given channel. Should return 365 daily
>>> averages.
>>>
>>>
>>>
>>> Best regards
>>>
>>>
>>>
>>> *Pål Andreassen*
>>>
>>> *54°23'58"S 3°18'53"E*
>>>
>>> *Konsulent*
>>>
>>> Mobil +47 982 85 504
>>>
>>> pal.andreas...@bouvet.no
>>>
>>>
>>>
>>>
>>> *Bouvet Norge AS Avdeling Grenland*
>>>
>>> Uniongata 18, Klosterøya
>>>
>>> N-3732 Skien
>>>
>>> Tlf +47 23 40 60 00
>>>
>>> *bouvet.no*
>>> 
>>>
>>>
>>>
>>>
>>>
>>
>>