Commit-log structure changes - versions

2019-02-07 Thread Sreenivasulu Nallapati
Hello folks,

I am exploring the CDC option to move data from cassandra to Hive on
periodic basis.
While exploring this option, I overheard saying that the internal
commit-log structure will change form version to version. Is this correct?

As per this link
,
sstables have changed in multiple times in multiple versions.
I want to understand about the commit-log internal structure as well. Is
there any change in the commit-log file structure in different cassandra
versions? If so, can someone please redirect me to the docs/change log?

Please help me to understand more on this. Thanks in advance

Thanks
Sreeni


Re: Two datacenters with one cassandra node in each datacenter

2019-02-07 Thread Kunal
Hi Dinesh,

We have very small setup and size of data is also very small. Max data size
is around 2gb. Latency expectations is around 10-15ms.


Regards,
Kunal

On Wed, Feb 6, 2019 at 11:27 PM dinesh.jo...@yahoo.com.INVALID
 wrote:

> You also want to use Cassandra with a minimum of 3 nodes.
>
> Dinesh
>
>
> On Wednesday, February 6, 2019, 11:26:07 PM PST, dinesh.jo...@yahoo.com <
> dinesh.jo...@yahoo.com> wrote:
>
>
> Hey Kunal,
>
> Can you add more details about the size of data, read/write throughput,
> what are your latency expectations, etc? What do you mean by "performance"
> issue with replication? Without these details it's a bit tough to answer
> your questions.
>
> Dinesh
>
>
> On Wednesday, February 6, 2019, 3:47:05 PM PST, Kunal <
> kunal.v...@gmail.com> wrote:
>
>
> HI All,
>
> I need some recommendation on using two datacenters with one node in each
> datacenter.
>
> In our organization, We are trying to have two cassandra dataceters with
> only 1 node on each side. From the preliminary investigation, I see
> replication is happening but I want to know if we can use this deployment
> in production? Will there be any performance issue with replication ?
>
> We have already setup 2 datacenters with one node on each datacenter and
> replication is working fine.
>
> Can you please let me know if this kind of setup is recommended for
> production deployment.
> Thanks in anticipation.
>
> Regards,
> Kunal Vaid
>


-- 



Regards,
Kunal Vaid


RE: range repairs multiple dc

2019-02-07 Thread Kenneth Brotman
This webpage has relevant information on procedures you need to use: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Thursday, February 07, 2019 1:31 PM
To: user@cassandra.apache.org
Subject: RE: range repairs multiple dc

 

A nice article on The Last Pickle blog at 
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html 
should be helpful to you.  A line in the comments following the article states:

 

“So restricting a -pr repair on a specific datacenter will be forbidden by 
Cassandra to prevent an incomplete repair from being performed.”

 

Give it a read.

 

Kenneth Brotman

 

From: CPC [mailto:acha...@gmail.com] 
Sent: Wednesday, February 06, 2019 11:59 PM
To: user@cassandra.apache.org
Subject: range repairs multiple dc

 

Hi All,

 

I searched over documentation but could not find enough reference regarding -pr 
option. In some documentation it says you have to cover all ring in some places 
it says you have to run it on every node regardless of you have multiple dc.

 

In our case we have three dc (DC1,DC2,DC3) with every DC having 4 nodes and 12 
nodes cluster in total. If i run "nodetool repair -pr --full" on very node in 
DC1, does it means DC1 is consistent but DC2 and DC3 is not or  DC1 is not 
consistent at all? Because in our case we added DC3 to our cluster and will 
remove DC2 from cluster so i dont care whether DC2 have consistent data. I dont 
want to repair DC2.

 

Also can i run "nodetool repair -pr -full" in parallel? I mean run it at the 
same time in each DC or run it more than one node in same DC? Does -dcpar 
option making the same thing?

 

Best Regards...



RE: range repairs multiple dc

2019-02-07 Thread Kenneth Brotman
A nice article on The Last Pickle blog at 
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html 
should be helpful to you.  A line in the comments following the article states:

 

“So restricting a -pr repair on a specific datacenter will be forbidden by 
Cassandra to prevent an incomplete repair from being performed.”

 

Give it a read.

 

Kenneth Brotman

 

From: CPC [mailto:acha...@gmail.com] 
Sent: Wednesday, February 06, 2019 11:59 PM
To: user@cassandra.apache.org
Subject: range repairs multiple dc

 

Hi All,

 

I searched over documentation but could not find enough reference regarding -pr 
option. In some documentation it says you have to cover all ring in some places 
it says you have to run it on every node regardless of you have multiple dc.

 

In our case we have three dc (DC1,DC2,DC3) with every DC having 4 nodes and 12 
nodes cluster in total. If i run "nodetool repair -pr --full" on very node in 
DC1, does it means DC1 is consistent but DC2 and DC3 is not or  DC1 is not 
consistent at all? Because in our case we added DC3 to our cluster and will 
remove DC2 from cluster so i dont care whether DC2 have consistent data. I dont 
want to repair DC2.

 

Also can i run "nodetool repair -pr -full" in parallel? I mean run it at the 
same time in each DC or run it more than one node in same DC? Does -dcpar 
option making the same thing?

 

Best Regards...



Re: How to read the Index.db file

2019-02-07 Thread Ben Slater
They don’t do exactly what you want but depending on why you are trying to
get this info you might find our sstable-tools useful:
https://github.com/instaclustr/cassandra-sstable-tools

---


*Ben Slater*
*Chief Product Officer*


   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 8 Feb 2019 at 08:14, Kenneth Brotman 
wrote:

> When you say you’re trying to get all the partition of a particular
> SSTable, I’m not sure what you mean.  Do you want to make a copy of it?  I
> don’t understand.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Pranay akula [mailto:pranay.akula2...@gmail.com]
> *Sent:* Wednesday, February 06, 2019 7:51 PM
> *To:* user@cassandra.apache.org
> *Subject:* How to read the Index.db file
>
>
>
> I was trying to get all the partition of a particular SSTable, i have
> tried reading Index,db file  i can read some part of it but not all of it ,
> is there any way to convert it to readable format?
>
>
>
>
>
> Thanks
>
> Pranay
>


RE: How to read the Index.db file

2019-02-07 Thread Kenneth Brotman
When you say you’re trying to get all the partition of a particular SSTable, 
I’m not sure what you mean.  Do you want to make a copy of it?  I don’t 
understand.

 

Kenneth Brotman

 

From: Pranay akula [mailto:pranay.akula2...@gmail.com] 
Sent: Wednesday, February 06, 2019 7:51 PM
To: user@cassandra.apache.org
Subject: How to read the Index.db file

 

I was trying to get all the partition of a particular SSTable, i have tried 
reading Index,db file  i can read some part of it but not all of it , is there 
any way to convert it to readable format?

 

 

Thanks

Pranay



Re: Bootstrap keeps failing

2019-02-07 Thread Kenneth Brotman
Lots of things come to mind. We need more information from you to help us 
understand:

How long have you had your cluster running?

Is it generally working ok?
Is it just one node that is misbehaving at a time?

How many nodes do you need to replace?

Are you doing rolling restarts instead of simultaneously?

Do you have enough capacity on your machines?  Did you say some of the nodes 
are at 90% capacity?

When did this problem begin?

Could something be causing a racing condition?

Did you recheck the commands you used to make sure they are correct?

What procedure do you use?

 

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Thursday, February 07, 2019 9:16 AM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Bootstrap keeps failing

 

Thank you for the recommendation. 

 

We are already using datastax's recommended settings for tcp_keepalive

 

Regards,

 

Leo

 

On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R  
wrote:

I have seen unreliable streaming (streaming that doesn’t finish) because of TCP 
timeouts from firewalls or switches. The default tcp_keepalive kernel 
parameters are usually not tuned for that. See 
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
 for more details. These “remote” timeouts are difficult to detect or prove if 
you don’t have access to the intermediate network equipment.

 

Sean Durity

From: Léo FERLIN SUTTON  
Sent: Thursday, February 07, 2019 10:26 AM
To: user@cassandra.apache.org; dinesh.jo...@yahoo.com
Subject: [EXTERNAL] Re: Bootstrap keeps failing

 

Hello ! 

Thank you for your answers.

 

So I have tried, multiple times, to start bootstrapping from scratch. I often 
have the same problem (on other nodes as well) but sometimes it works and I can 
move on to another node.

 

I have joined a jstack dump and some logs.

 

Our node was shut down at around 97% disk space used

I turned it back on and it starting the bootstrap process again. 

 

The log file is the log from this attempt, same for the thread dump.

 

Small warning, I have somewhat anonymised the log files so there may be some 
inconsistencies.

 

Regards,

 

Leo

 

On Thu, Feb 7, 2019 at 8:13 AM dinesh.jo...@yahoo.com.INVALID 
mailto:dinesh.joshi@yahoocom.invalid> > wrote:

Would it be possible for you to take a thread dump & logs and share them?

 

Dinesh

 

 

On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON 
 wrote: 

 

 

Hello !

 

I am having a recurrent problem when trying to bootstrap a few new nodes.

 

Some general info : 

*   I am running cassandra 3.0.17
*   We have about 30 nodes in our cluster
*   All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra

So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.

 

`nodetool status` says the node is still joining, 

 

When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :

1.  The node fills up to 100% disk space and crashes.
2.  The bootstrap resume finishes with errors

When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.

 

This is the output I get from `nodetool resume` :

[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)

[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,887] received file 

Re: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Léo FERLIN SUTTON
Thank you for the recommendation.

We are already using datastax's recommended settings for tcp_keepalive.

Regards,

Leo

On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R 
wrote:

> I have seen unreliable streaming (streaming that doesn’t finish) because
> of TCP timeouts from firewalls or switches. The default tcp_keepalive
> kernel parameters are usually not tuned for that. See
> https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
> for more details. These “remote” timeouts are difficult to detect or prove
> if you don’t have access to the intermediate network equipment.
>
>
>
> Sean Durity
>
> *From:* Léo FERLIN SUTTON 
> *Sent:* Thursday, February 07, 2019 10:26 AM
> *To:* user@cassandra.apache.org; dinesh.jo...@yahoo.com
> *Subject:* [EXTERNAL] Re: Bootstrap keeps failing
>
>
>
> Hello !
>
> Thank you for your answers.
>
>
>
> So I have tried, multiple times, to start bootstrapping from scratch. I
> often have the same problem (on other nodes as well) but sometimes it works
> and I can move on to another node.
>
>
>
> I have joined a jstack dump and some logs.
>
>
>
> Our node was shut down at around 97% disk space used.
>
> I turned it back on and it starting the bootstrap process again.
>
>
>
> The log file is the log from this attempt, same for the thread dump.
>
>
>
> Small warning, I have somewhat anonymised the log files so there may be
> some inconsistencies.
>
>
>
> Regards,
>
>
>
> Leo
>
>
>
> On Thu, Feb 7, 2019 at 8:13 AM dinesh.jo...@yahoo.com.INVALID <
> dinesh.jo...@yahoo.com.invalid> wrote:
>
> Would it be possible for you to take a thread dump & logs and share them?
>
>
>
> Dinesh
>
>
>
>
>
> On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON <
> lfer...@mailjet.com.INVALID> wrote:
>
>
>
>
>
> Hello !
>
>
>
> I am having a recurrent problem when trying to bootstrap a few new nodes.
>
>
>
> Some general info :
>
>- I am running cassandra 3.0.17
>- We have about 30 nodes in our cluster
>- All healthy nodes have between 60% to 90% used disk space on
>/var/lib/cassandra
>
> So I create a new node and let auto_bootstrap do it's job. After a few
> days the bootstrapping node stops streaming new data but is still not a
> member of the cluster.
>
>
>
> `nodetool status` says the node is still joining,
>
>
>
> When this happens I run `nodetool bootstrap resume`. This usually ends up
> in two different ways :
>
>1. The node fills up to 100% disk space and crashes.
>2. The bootstrap resume finishes with errors
>
> When I look at `nodetool netstats -H` is  looks like `bootstrap resume`
> does not resume but restarts a full transfer of every data from every node.
>
>
>
> This is the output I get from `nodetool resume` :
>
> [2019-02-06 01:39:14,369] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:39:16,821] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:39:17,003] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress:
> 2113%)
>
> [2019-02-06 01:41:15,160] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:02,864] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:09,284] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:10,522] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:10,622] received file
> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
> (progress: 2113%)
>
> [2019-02-06 01:42:11,925] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
> (progress: 2114%)
>
> [2019-02-06 01:42:14,887] received file
> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
> (progress: 2114%)
>
> [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress:
> 2114%)
>
> [2019-02-06 01:42:14,980] Stream failed
>
> [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
>
> [2019-02-06 01:42:14,982] Resume bootstrap complete
>
>
>
> The bootstrap `progress` goes way over 100% and eventually fails.
>
>
>
>
>
> Right now I have a node with this output from `nodetool status` :
>
> `UJ  10.16.XX.YYY  2.93 TB256  ?
>  5788f061-a3c0-46af-b712-ebeecd397bf7  c`
>
>
>
> It is almost filled with data, yet if I look at 

RE: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Durity, Sean R
I have seen unreliable streaming (streaming that doesn’t finish) because of TCP 
timeouts from firewalls or switches. The default tcp_keepalive kernel 
parameters are usually not tuned for that. See 
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
 for more details. These “remote” timeouts are difficult to detect or prove if 
you don’t have access to the intermediate network equipment.

Sean Durity
From: Léo FERLIN SUTTON 
Sent: Thursday, February 07, 2019 10:26 AM
To: user@cassandra.apache.org; dinesh.jo...@yahoo.com
Subject: [EXTERNAL] Re: Bootstrap keeps failing

Hello !

Thank you for your answers.

So I have tried, multiple times, to start bootstrapping from scratch. I often 
have the same problem (on other nodes as well) but sometimes it works and I can 
move on to another node.

I have joined a jstack dump and some logs.

Our node was shut down at around 97% disk space used.
I turned it back on and it starting the bootstrap process again.

The log file is the log from this attempt, same for the thread dump.

Small warning, I have somewhat anonymised the log files so there may be some 
inconsistencies.

Regards,

Leo

On Thu, Feb 7, 2019 at 8:13 AM 
dinesh.jo...@yahoo.com.INVALID 
mailto:dinesh.jo...@yahoo.com.invalid>> wrote:
Would it be possible for you to take a thread dump & logs and share them?

Dinesh


On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON 
mailto:lfer...@mailjet.com.INVALID>> wrote:


Hello !

I am having a recurrent problem when trying to bootstrap a few new nodes.

Some general info :

  *   I am running cassandra 3.0.17
  *   We have about 30 nodes in our cluster
  *   All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra
So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.

`nodetool status` says the node is still joining,

When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :

  1.  The node fills up to 100% disk space and crashes.
  2.  The bootstrap resume finishes with errors
When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.

This is the output I get from `nodetool resume` :
[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)
[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)
[2019-02-06 01:42:14,887] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
 (progress: 2114%)
[2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress: 2114%)
[2019-02-06 01:42:14,980] Stream failed
[2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
[2019-02-06 01:42:14,982] Resume bootstrap complete

The bootstrap `progress` goes way over 100% and eventually fails.


Right now I have a node with this output from `nodetool status` :
`UJ  10.16.XX.YYY  2.93 TB256  ? 
5788f061-a3c0-46af-b712-ebeecd397bf7  c`

It is almost filled with data, yet if I look at `nodetool netstats` :
Receiving 480 files, 325.39 GB total. Already received 5 files, 68.32 
MB total
Receiving 499 files, 328.96 GB total. Already received 1 files, 1.32 GB 
total
Receiving 506 files, 345.33 GB total. Already received 6 files, 24.19 
MB total
Receiving 362 files, 206.73 GB total. Already received 7 files, 34 MB 
total
Receiving 424 files, 281.25 GB total. Already 

RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Durity, Sean R
Kenneth is right. Trying to port/support a relational model to a CQL model the 
way you are doing it is not going to go well. You won’t be able to scale or get 
the search flexibility that you want. It will make Cassandra seem like a bad 
fit. You want to play to Cassandra’s strengths – availability, low latency, 
scalability, etc. so you need to store the data the way you want to retrieve it 
(query first modeling!). You could look at defining the “right” partition and 
clustering keys, so that the searches are within a single, reasonably sized 
partition. And you could have lookup tables for other common search patterns 
(item_by_model_name, etc.)

If that kind of modeling gets you to a situation where you have too many lookup 
tables to keep consistent, you could consider something like DataStax 
Enterprise Search (embedded SOLR) to create SOLR indexes on searchable fields. 
A SOLR query will typically be an order of magnitude slower than a partition 
key lookup, though.

It really boils down to the purpose of the data store. If you are looking for 
primarily an “anything goes” search engine, Cassandra may not be a good choice. 
If you need Cassandra-level availability, extremely low latency queries (on 
known access patterns), high volume/low latency writes, easy scalability, etc. 
then you are going to have to rethink how you model the data.


Sean Durity

From: Kenneth Brotman 
Sent: Thursday, February 07, 2019 7:01 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Peter,

Sounds like you may need to use a different architecture.  Perhaps you need 
something like Presto or Kafka as a part of the solution.  If the data from the 
legacy system is wrong for Cassandra it’s an ETL problem?  You’d have to 
transform the data you want to use with Cassandra so that a proper data model 
for Cassandra can be used.

From: Peter Heitman [mailto:pe...@heitman.us]
Sent: Wednesday, February 06, 2019 10:05 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I have read the material. The problem is that the application has a query 
facility available to the user where they can type in "(A = foo AND B = bar) OR 
C = chex" where A, B, and C are from a defined list of terms, many of which are 
columns in the mytable below while others are from other tables. This query 
facility was implemented and shipped years before we decided to move to 
Cassandra
On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
The problem is you’re not using a query first design.  I would recommend first 
reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben 
Hewitt.  It’s available free online at this 
link.

Kenneth Brotman

From: Peter Heitman [mailto:pe...@heitman.us]
Sent: Wednesday, February 06, 2019 6:33 PM

To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient 
data model. I haven't figured out how to do it correctly yet
On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

Kenneth Brotman

From: Peter Heitman [mailto:pe...@heitman.us]
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case.
On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman 
mailto:kenbrotman@yahoo.cominvalid>> wrote:
Isn’t that a lot of SASI indexes for one table.  Could you denormalize 

RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Kenneth Brotman
Peter,

 

Sounds like you may need to use a different architecture.  Perhaps you need 
something like Presto or Kafka as a part of the solution.  If the data from the 
legacy system is wrong for Cassandra it’s an ETL problem?  You’d have to 
transform the data you want to use with Cassandra so that a proper data model 
for Cassandra can be used.

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 10:05 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

Yes, I have read the material. The problem is that the application has a query 
facility available to the user where they can type in "(A = foo AND B = bar) OR 
C = chex" where A, B, and C are from a defined list of terms, many of which are 
columns in the mytable below while others are from other tables. This query 
facility was implemented and shipped years before we decided to move to 
Cassandra 

On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman  
wrote:

The problem is you’re not using a query first design.  I would recommend first 
reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben 
Hewitt.  It’s available free online at this link 

 .

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:33 PM


To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient 
data model. I haven't figured out how to do it correctly yet 

On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman  
wrote:

Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case. 

On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman  
wrote:

Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic42 text,

  generic43 text,

range repairs multiple dc

2019-02-07 Thread CPC
Hi All,

I searched over documentation but could not find enough reference regarding
-pr option. In some documentation it says you have to cover all ring in
some places it says you have to run it on every node regardless of you have
multiple dc.

In our case we have three dc (DC1,DC2,DC3) with every DC having 4 nodes and
12 nodes cluster in total. If i run "nodetool repair -pr --full" on very
node in DC1, does it means DC1 is consistent but DC2 and DC3 is not or  DC1
is not consistent at all? Because in our case we added DC3 to our cluster
and will remove DC2 from cluster so i dont care whether DC2 have consistent
data. I dont want to repair DC2.

Also can i run "nodetool repair -pr -full" in parallel? I mean run it at
the same time in each DC or run it more than one node in same DC? Does
-dcpar option making the same thing?

Best Regards...