Re: openjdk for cassandra production cluster

2018-10-10 Thread Christophe Schmitz
It is fixed in 3.11.2 https://issues.apache.org/jira/browse/CASSANDRA-13916



On Thu, 11 Oct 2018 at 02:10 prachirath72  wrote:

> Thanks Jonathan,
> Is there a ticket/bugid to remove this openjdk WARN.
> Want to have a look .
>
>
>  Original message 
> From: Jonathan Haddad 
> Date: 10/10/18 10:46 AM (GMT-05:00)
> To: user@cassandra.apache.org
> Subject: Re: openjdk for cassandra production cluster
>
> The warning should be removed (if it hasn’t already), it’s unnecessary at
> this point
>
> On Wed, Oct 10, 2018 at 7:41 AM Prachi Rath 
> wrote:
>
>> HI users,
>> I have created a cassandra cluster with openjdk 1.8.0_181
>> version.(cassandra 2.1.17)
>> started each node, cluster looks healthy,but in  the log files saw the
>> WARN
>> message below:
>>
>> WARN [main] 2014-01-28 06:02:17,861 CassandraDaemon.java (line 155)
>> OpenJDK
>> is not recommended. Please upgrade to the newest Oracle Java release
>>
>> Is this message  WARN informational only or can it be real issue?
>> Did any one noticed something like this (Or using openjdk for production
>> environment)
>>
>> Thanks ,
>> Prachi
>>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


Re: SSTableMetadata Util

2018-10-01 Thread Christophe Schmitz
Hi Pranay,

The sstablemetadata is still available in the tarball file
($CASSANDRA_HOME/tools/bin) in 3.11.3. Not sure why it is not available in
your packaged installation, you might want to manually copy the one from
the package to your /usr/bin/

Additionaly, you can have a look at
https://github.com/instaclustr/cassandra-sstable-tools which will provided
you with the desired info, plus more info you might find useful.


Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - Cassandra
| Kafka | Spark Consulting





On Tue, 2 Oct 2018 at 11:31 Pranay akula  wrote:

> Hi,
>
> I am testing apache 3.11.3 i couldn't find sstablemetadata util
>
> All i can see is only these utilities in /usr/bin
>
> -rwxr-xr-x.   1 root root2042 Jul 25 06:12 sstableverify
> -rwxr-xr-x.   1 root root2045 Jul 25 06:12 sstableutil
> -rwxr-xr-x.   1 root root2042 Jul 25 06:12 sstableupgrade
> -rwxr-xr-x.   1 root root2042 Jul 25 06:12 sstablescrub
> -rwxr-xr-x.   1 root root2034 Jul 25 06:12 sstableloader
>
>
> If this utility is no longer available how can i get sstable metadata like
> repaired_at, Estimated droppable tombstones
>
>
> Thanks
> Pranay
>


Re: Cassandra loading data from another table

2018-10-01 Thread Christophe Schmitz
Have a look at using Spark on Cassandra. It's commonly used for data
movement / data migration / reconciliation (on top of analytics). You will
get much better performances.

Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - Cassandra
| Kafka | Spark Consulting





On Tue, 2 Oct 2018 at 09:58 Richard Xin 
wrote:

> Christophe, thanks for your insights,
> Sorry, I forgot to mention that currently both tableA and tableB are being
> updated by application (all newly inserted/updated records should be
> identical on A and B), exporting from tableB and COPY it back later on will
> result in older data overwrites newly updated data.
>
> I can only thinking about using COPY tableA to a csv, and then iterate the
> csv line by line to insert to tableB using "if not exists" clause to avoid
> down-time , but it's error-prone and slow. Not sure whether there is a
> better way.
> Best,
> Richard
>
> On Monday, October 1, 2018, 4:34:38 PM PDT, Christophe Schmitz <
> christo...@instaclustr.com> wrote:
>
>
> Hi Richard,
>
> You could consider exporting your few thousands record of Table B in a
> file, with *COPY TO*. Then *TRUNCATE* Table B, copy the SSTable files of
> TableA to the data directory of Table A (make sure you *flush* the
> memtables first), then run nodetool *refresh*. Final step is to load the
> few thousands record on Table B with *COPY FROM*. This will overwrite the
> data you loaded from the SSTables of Table A.
> Overall, there is no downtime on your cluster, there is no downtime on
> Table A, yet you need to think about the consequences on Table B if your
> application is writing on Table A or Table B during this process.
> Please test first :)
>
> Cheers,
> Christophe
>
> Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> -
> Cassandra | Kafka | Spark Consulting
>
>
>
>
> On Tue, 2 Oct 2018 at 09:18 Richard Xin 
> wrote:
>
> I have a tableA with about a few ten millions record, and I have tableB
> with a few thousands record,
> TableA and TableB have exact same schema (except that tableB doesnt have
> TTL)
>
> I want to load all data to tableB from tableA EXCEPT for those already on
> tableB (we don't want data on tableB to be overwritten)
>
> What's the best to way accomplish this?
>
> Thanks,
>
>


Re: [EXTERNAL] Re: Rolling back Cassandra upgrades (tarball)

2018-10-01 Thread Christophe Schmitz
Adding to the thread:

   - SSTable format is identical between 3.0.x and 3..11.x, so your SSTable
   files are compatible, in this case. BTW an easy way to check that is to
   look at the SSTables filename convention; first letters ('mc' in this case)
   indicate the SSTable storage format version.
   - In the future, if you really really want rollback when doing a major
   upgrade with a change of SSTable format, your only option will be to create
   a secondary data center (same number of nodes, same Cassandra version,
   please check your keyspaces are using NetworkTopologyStrategy). You will be
   able to upgrade the Cassandra version of one DC, while keeping the other DC
   to the current version. You will need to consider carefully the consistency
   level of your application (probably LOCAL_QUORUM) so that your application
   is writing to one DC, with automatic replication on the secondary DC. Once
   you are happy, you can decommission the old version DC (check carefully
   your application endpoint configuration, local_dc configuration)

Hope this helps.


Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - Cassandra
| Kafka | Spark Consulting



On Mon, 1 Oct 2018 at 23:18 Durity, Sean R 
wrote:

> Version choices aside, I am an advocate for forward-only (in most cases).
> Here is my reasoning, so that you can evaluate for your situation:
> - upgrades are done while the application is up and live and writing data
> (no app downtime)
> - the upgrade usually includes a change to the sstable version (which is
> unreadable in the older version)
> - any data written to upgraded nodes will be written in the new sstable
> format
> + this includes any compaction that takes place on upgraded nodes, so even
> an app outage doesn't protect you
> - so, there is no going back, unless you are willing to lose new (or
> compacted) data written to any upgraded nodes
>
> As you can tell, if the assumptions don't hold true, a roll back may be
> possible. For example, if the sstable version is the same (e.g., for a
> minor upgrade), then the risk of lost data is gone. Or, if you are able to
> stop your application during the upgrade process and stop compaction. Etc.
>
> You could upgrade a single node to see how it behaves. If there is some
> problem, you could wipe out the data, go back to the old version, and
> bootstrap it again. Once I get to the 2nd node, though, I am only going
> forward.
>
> Sean Durity
>
>
> -Original Message-
> From: Jeff Jirsa 
> Sent: Sunday, September 30, 2018 8:38 PM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] Re: Rolling back Cassandra upgrades (tarball)
>
> Definitely don’t go to 3.10, go to 3.11.3 or newest 3.0 instead
>
>
> --
> Jeff Jirsa
>
>
> On Sep 30, 2018, at 5:29 PM, Nate McCall  wrote:
>
> >> I have a cluster on v3.0.11 I am planning to upgrade this to 3.10.
> >> Is rolling back the binaries a viable solution?
> >
> > What's the goal with moving form 3.0 to 3.x?
> >
> > Also, our latest release in 3.x is 3.11.3 and has a couple of
> > important bug fixes over 3.10 (which is a bit dated at this point).
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> 
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>


Re: Cassandra loading data from another table

2018-10-01 Thread Christophe Schmitz
Hi Richard,

You could consider exporting your few thousands record of Table B in a
file, with *COPY TO*. Then *TRUNCATE* Table B, copy the SSTable files of
TableA to the data directory of Table A (make sure you *flush* the
memtables first), then run nodetool *refresh*. Final step is to load the
few thousands record on Table B with *COPY FROM*. This will overwrite the
data you loaded from the SSTables of Table A.
Overall, there is no downtime on your cluster, there is no downtime on
Table A, yet you need to think about the consequences on Table B if your
application is writing on Table A or Table B during this process.
Please test first :)

Cheers,
Christophe

Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - Cassandra
| Kafka | Spark Consulting




On Tue, 2 Oct 2018 at 09:18 Richard Xin 
wrote:

> I have a tableA with about a few ten millions record, and I have tableB
> with a few thousands record,
> TableA and TableB have exact same schema (except that tableB doesnt have
> TTL)
>
> I want to load all data to tableB from tableA EXCEPT for those already on
> tableB (we don't want data on tableB to be overwritten)
>
> What's the best to way accomplish this?
>
> Thanks,
>


Re: Cassandra Storage per node

2018-09-27 Thread Christophe Schmitz
Hi Suresh,

Welcome to Cassandra!

Node density is an important topic with Cassandra. Depending on the node
type, data usage, and your operational expertise, you can go somewhere
between 1TB and 3TB of data size. If you just start, stay below of 1TB to
avoid troubles. Storing too much data on a node make it difficult to run
repairs, to replace node, add nodes etc

As you plan to use STCS, you should know that in the worth case scenario,
during major compactions, you might need about twice the volume size of
your data (on a given node) to complete the compaction, so make sure you
plan for that too.

Final note, concerning the type of disk, you should be perfectly fine with
SSD RAID0. If your application is read intensive, and you want to have the
lowest latency possible, you could go with the more expensive NVME. If that
was the case, you should also look at using LTCS instead of STCS.

Hope this help!

Cheers,
Christophe

Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - Cassandra
| Kafka | Spark Consulting

On Fri, 28 Sep 2018 at 14:43 Suresh Rajagopal  wrote:

> Hi,
>
> I am new to Cassandra. Is there any recommended maximum data size per node
> for Cassandra 3 with STCS. Also any recommendation on SSD RAID 0 vs NVME
> JBOD?
>
> Thanks
> Suresh
>


Re: GUI clients for Cassandra

2018-05-04 Thread Christophe Schmitz
MV yes,
SASI not sure, I would guess yes.

On 2 May 2018 at 18:00, Hannu Kröger  wrote:

> Ah, you are correct!
>
> However, it’s not being updated anymore AFAIK. Do you know if it support
> the latest 3.x features? SASI, MV, etc. ?
>
> Hannu
>
>
> On 24 Apr 2018, at 03:45, Christophe Schmitz 
> wrote:
>
> Hi Hannu ;)
>
>
>
>>
>> I have been asked many times that what is a good GUI client for
>>> Cassandra. DevCenter is not available anymore and DataStax has a DevStudio
>>> but that’s for DSE only.
>>>
>>
>  DevCenter is still available, I just downloaded it.
>
> Cheers,
> Christophe
>
>
>
> --
>
> *Christophe Schmitz - **VP Consulting*
>
> AU: +61 4 03751980 / FR: +33 7 82022899
>
><https://www.facebook.com/instaclustr>
> <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>. This email has been sent on behalf
> of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
> email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
>


-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

   <https://www.facebook.com/instaclustr>
<https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Version Upgrade

2018-04-25 Thread Christophe Schmitz
Hi Pranay,

You only need to upgrade your SSTables when you perform a major Cassandra
version upgrade, so you don't need to run it for upgrading in the 3.x.x
series.
One way to check which storage version your SSTables are using is to look
at the SSTables name. It is structured as:
--.db The version is a string that
represents the SSTable storage format version.
The version is "mc" in the 3.x.x series.

Cheers,
Christophe



On 26 April 2018 at 06:06, Pranay akula  wrote:

> When is it necessary to upgrade SSTables ?? For a minor upgrade do we need
> to run upgrade stables??
>
> I knew when we are doing a major upgrade we have to run upgrade sstables
> so that sstables will be re-written to newer version with additional meta
> data.
>
> But do we need to run upgrade sstables for upgrading from let's say 3.0.15
> to 3.0.16 or 3.0.y to 3.11.y??
>
>
> Thanks
> Pranay
>



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

   <https://www.facebook.com/instaclustr>
<https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: GUI clients for Cassandra

2018-04-23 Thread Christophe Schmitz
Hi Hannu ;)



>
> I have been asked many times that what is a good GUI client for Cassandra.
>> DevCenter is not available anymore and DataStax has a DevStudio but that’s
>> for DSE only.
>>
>
 DevCenter is still available, I just downloaded it.

Cheers,
Christophe



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

   <https://www.facebook.com/instaclustr>
<https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: A Cassandra Storage Estimation Mechanism

2018-04-18 Thread Christophe Schmitz
Hi Onmestester,

A few comments inline:

>
> 1. I'm using the real schema + > 3 nodes cluster
>

Since you are only interested in data usage, for simplicity, you could use
a single node cluster (your computer), and use RF=1. If your production
cluster will use RF=3, you will just need to multiply. This assumes that
your data model will distribute your partitions evenly across your cluster.



> 2. Required assumptions: Real input rate (200K per seconds that would be
> 150 Billions totally) and Real partition count(Unique Keys in partitions:
> 1.5 millions totally)
> 3. Instead of 150 billions, i'm doing 1 , 10 and 100 millions write so i
> would use 10, 100 and 1000 partitions proportionally! after each run, i
> would use 'nodetool flush'
> and using du -sh keyspace_dir, i would check the total disk usage of the
> rate, for example for rate 1 million, disk usage was 90 MB, so for 150Bil
> it would be 13 TB . then drop the schema and run the next rate.
> I would continue this until differential of two consecuence results, would
> be a tiny number.
> I've got a good estimation at rate 100 Millions. Actually i was doing the
> estimation for an already runnig production cluster
> and i knew the answer beforehand (just wanted to be sure about the idea),
> and estimation was equal to answer finally! but i'm worried that it was
> accidental.
> Finally the question: Is my estimation mechanism correct and would be
> applicable for any estimation and any project?
>

Running a simulation like you are doing should give you a very good
estimate, that looks correct to me, as long as you don't forget to clear
the auto-snapshots after you drop your table ;o)


> If not, how to estimate storage (How you estimate)?
>

Often, the data usage is driven by a single table, and often by a single
column in the table (i.e. a json text field of a few KB), in which case the
math is very simple and safe to execute, and this gives a good start.
Ideally, a simulation should be run, i.e. using cassandra-stress. The goal
is usually to confirm the throughput / latency. As a side effect, this also
gives the disk usage.

Hope it helps!

Cheers,

Christophe


>
> Thanks in advance
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>


-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899


Re: Why nodetool cleanup should be run sequentially after node joined a cluster

2018-04-10 Thread Christophe Schmitz
Hi Mikhail,


Nodetool cleanup can add a fair amount of extra load (mostly IO) on your
Cassandra nodes. Therefore it is recommended to run it during lower cluster
usage, and one node at a time, in order to limit the impact on your
cluster. There are no technical limitations that would prevent you to run
it at the same time. It's just a precaution measure.

Cheers,
Christophe


On 11 April 2018 at 14:49, Mikhail Tsaplin  wrote:

> Hi,
> In https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/
> opsAddNodeToCluster.html
> there is recommendation:
> 6) After all new nodes are running, run nodetool cleanup
> <https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsCleanup.html>
>  on each of the previously existing nodes to remove the keys that no
> longer belong to those nodes. Wait for cleanup to complete on one node
> before running nodetool cleanup on the next node.
>
> I had added a new node to the cluster, and running nodetool cleanup
> according to this recommendation - but it takes near 10 days to complete on
> a single node. Is it safe to start it on all nodes?
>



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: copy from one table to another

2018-04-08 Thread Christophe Schmitz
If you need this kind of logic, you might want to consider using Spark.
It's often used for data migration.
You could load your list of partition_key in a Spark RDD, then
use joinWithCassandraTable, and write the result back to your destination
table.
Just before the join, you could use repartitionByCassandraReplica on your
RDD to have better data locality.
This documentation can be helpful:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md#performing-efficient-joins-with-cassandra-tables-since-12

Hope it helps

Cheers,
Christophe

On 9 April 2018 at 13:09, onmstester onmstester  wrote:

> Thank you all
> I need something like this:
> insert into table test2 select * from test1 where
> partition_key='SOME_KEYS';
> The problem with copying sstable is that original table contains some
> billions of records and i only want some hundred millions of records from
> the table, so after copy/pasting big sstables in so many nodes i should
> wait for a deletion that would take so long to response:
> delete from test2 where partition_key != 'SOME_KEYS'
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  On Mon, 09 Apr 2018 06:14:02 +0430 *Dmitry Saprykin
> >* wrote 
>
> IMHO The best step by step description of what you need to do is here
>
> https://issues.apache.org/jira/browse/CASSANDRA-1585?
> focusedCommentId=13488959&page=com.atlassian.jira.
> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13488959
>
> The only difference is that you need to copy data from one table only. I
> did it for a whole keyspace.
>
>
>
>
> On Sun, Apr 8, 2018 at 3:06 PM Jean Carlo 
> wrote:
>
> You can use the same procedure to restore a table from snapshot from
> datastax webpage
>
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> operations/ops_backup_snapshot_restore_t.html
> Just two modifications.
>
> after step 5, modify the name of the sstables to add the name of the table
> you want to copy to.
>
> and in the step 6 copy the sstables to the right directory corresponding
> to the tale you want to copy to.
>
> Be sure you have an snapshot of the table source and ignore step 4 of
> course
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Sun, Apr 8, 2018 at 6:33 PM, Dmitry Saprykin  > wrote:
>
> You can copy hardlinks to ALL SSTables from old to new table and then
> delete part of data you do not need in a new one.
>
> On Sun, Apr 8, 2018 at 10:20 AM, Nitan Kainth 
> wrote:
>
> If it for testing and you don’t need any specific data, just copy a set of
> sstables with all files of that sequence and move to target tables
> directory and rename it.
>
> Restart target node or run nodetool refresh
>
> Sent from my iPhone
>
> On Apr 8, 2018, at 4:15 AM, onmstester onmstester 
> wrote:
>
> Is there any way to copy some part of a table to another table in
> cassandra? A large amount of data should be copied so i don't want to fetch
> data to client and stream it back to cassandra using cql.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>
>


-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Apache Cassandra start up script

2018-03-27 Thread Christophe Schmitz
Hi Anumod,

When you install Cassandra using the tarball install, you will not get a
service file installed. It becomes your responsability to write one (or to
take one from the internet).
When you install Cassandra using a package (.deb, .rpm..), the installer
should also install the service files, set the conf files in /etc directory
etc... You should then get access to commands like systemctl start/stop
Cassandra.



Hope it helps!

Cheers,
Christophe

On 28 March 2018 at 11:20, Anumod Mullachery 
wrote:

> Hi All ,
>
> I’ve installed Apache Cassandra( Tarball install )..
> But there is no way to see the Cassandra status / stop option .
>
>
> Anyone has the Apache Cassandra start / stop script for Cassandra 3.12
>
> Can some one help on this ?
>
>
> Thanks,
>
> Anumod
> Mob-718-844-3841
> PA,USA
>
> Sent from my iPhone
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Execute an external program

2018-03-27 Thread Christophe Schmitz
Hi Earl,

You probably want to check Cassandra triggers:
http://cassandra.apache.org/doc/latest/cql/triggers.html
You can write arbitrary code that is called for the DML against your table.

Cheers,

Christophe


On 28 March 2018 at 10:58, Earl Lapus  wrote:

> Hi All,
>
> I may be over the edge here, but is there a way to execute an external
> program if a new row is added or if an existing row is updated on a table?
>
> Cheers,
> Earl
>
> --
> There are seven words in this sentence.
>



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: high latency on one node after replacement

2018-03-27 Thread Christophe Schmitz
Hi Mike,

Unlike normal EBS volumes for which you don't need to pre-warm, I think
you  need to pre-Warm your EBS volume restored from a snapshot
Have a look at this AWS doc
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html
It says that:
However, storage blocks on volumes that were restored from snapshots must
be initialized (pulled down from Amazon S3 and written to the volume)
before you can access the block. This preliminary action takes time and can
cause a significant increase in the latency of an I/O operation the first
time each block is accessed. For most applications, amortizing this cost
over the lifetime of the volume is acceptable. Performance is restored
after the data is accessed once.

I hope it helps :)

Cheers,

Christophe

On 28 March 2018 at 06:24, Mike Torra  wrote:

> Hi There -
>
> I have noticed an issue where I consistently see high p999 read latency on
> a node for a few hours after replacing the node. Before replacing the node,
> the p999 read latency is ~30ms, but after it increases to 1-5s. I am
> running C* 3.11.2 in EC2.
>
> I am testing out using EBS snapshots of the /data disk as a backup, so
> that I can replace nodes without having to fully bootstrap the replacement.
> This seems to work ok, except for the latency issue. Some things I have
> noticed:
>
> - `nodetool netstats` doesn't show any 'Completed' Large Messages, only
> 'Dropped', while this is going on. There are only a few of these.
> - the logs show warnings like this:
>
> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655
> NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s
> with average duration of 235.88ms, 86 have exceeded the configured commit
> interval by an average of 113.66ms
>   and I can see some slow queries in debug.log, but I can't figure out
> what is causing it
> - gc seems normal
>
> Could this have something to do with starting the node with the EBS
> snapshot of the /data directory? My first thought was that this is related
> to the EBS volumes, but it seems too consistent to be actually caused by
> that. The problem is consistent across multiple replacements, and multiple
> EC2 regions.
>
> I appreciate any suggestions!
>
> - Mike
>



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Measuring eventual consistency latency

2018-03-25 Thread Christophe Schmitz
Hi Jeronimo,

I am not sure that will address your exact request, but did you look at
this issue (resolved in 3.8) which adds a ned latency across DCs metrics?
https://issues.apache.org/jira/browse/CASSANDRA-11569

Cheers,

Christophe

On 26 March 2018 at 10:01, Jeronimo de A. Barros 
wrote:

> I'd like to know if there is a reasonable method to measure how long take
> to have the data available across all replica nodes in a multi DC
> environment using LOCAL_ONE or LOCAL_QUORUM consistency levels.
>
> If already there be a study about this topic in some place and someone
> could point me the direction, it will be of great help.
>
> Thanks !
>



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Deserialize Map[Int, UDT] to a case class from Spark Connector

2018-03-25 Thread Christophe Schmitz
Hi Guillermo

Which version of Spark are you using? Starting with Version 2.0, Spark is
built with Scala 2.11 by default. If you are using a prior version (which
looks like it's the case since your error message mention scala 2.10), you
might need to build it yourself from sources with Scala 2.11 support or to
upgrade your Spark cluster to 2.x

Cheers,

Christophe


On 26 March 2018 at 09:11, Guillermo Ortiz  wrote:

> Hello,
>
> I'm working with UDT's and spark connector with these dependencies:
>
> 2.11.12
> 2.0.2
> 2.0.7
> 3.4.0
>
>
> 
> org.apache.spark
> spark-core_2.11
> ${spark.version}
> 
>
> 
> org.apache.spark
> spark-streaming_2.11
> ${spark.version}
> 
>
>
> 
> com.datastax.spark
> spark-cassandra-connector_2.11
> ${cassandra-conector.version}
> 
>
> 
> com.datastax.cassandra
> cassandra-driver-core
> ${cassandra-driver.version}
> 
>
> So, with these dependencies I'm using scala 2.11, but I get this error *the 
> GettableToMappedTypeConverter which can't deserialize TypeTags due
> to Scala 2.10 TypeTag limitation. They come back as nulls and therefore
> you see this NPE.*
>
> Why do I get this error if I'm using SCALA 2.11? I want to read a 
> MAP[Int,MyUDT>]
> from Spark with the connector. The problem it's that if theree are any field 
> which it's not setted it's not possible.
>
> If all fields are setted it works.
>
>
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent
> failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
> java.lang.NullPointerException:
> Requested a TypeTag of the GettableToMappedTypeConverter which can't
> deserialize TypeTags due to Scala 2.10 TypeTag limitation. They come back
> as nulls and therefore you see this NPE.
> at com.datastax.spark.connector.rdd.reader.
> GettableDataToMappedTypeConverter.targetTypeTag(
> GettableDataToMappedTypeConverter.scala:34)
> at com.datastax.spark.connector.types.TypeConverter$
> AbstractMapConverter.valueTypeTag(TypeConverter.scala:707)
> at com.datastax.spark.connector.types.TypeConverter$
> MapConverter$$typecreator45$1.apply(TypeConverter.scala:791)
> at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$
> lzycompute(TypeTags.scala:232)
> at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
> at com.datastax.spark.connector.types.TypeConverter$class.
> targetTypeName(TypeConverter.scala:36)
> at com.datastax.spark.connector.types.TypeConverter$
> CollectionConverter.targetTypeName(TypeConverter.scala:682)
> at com.datastax.spark.connector.rdd.reader.
> GettableDataToMappedTypeConverter.tryConvert(
> GettableDataToMappedTypeConverter.scala:156)
>



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Cassandra CF Level Metrics (Read, Write Count and Latency)

2017-08-31 Thread Christophe Schmitz
Hi Jai,

The ReadLatency MBean expose a few metrics, including the count one, which
is the total read requests you are after.
See attached screenshot

Cheers,

Christophe

On 1 September 2017 at 09:21, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> I did look at the document and tried setting up the metric as following,
> does this is not matching with the total read requests. I am using
> "ReadLatency_OneMinuteRate"
>
> /org.apache.cassandra.metrics:type=ColumnFamily,keyspace=*,
> scope=*,name=ReadLatency
>
> On Thu, Aug 31, 2017 at 4:17 PM, Christophe Schmitz <
> christo...@instaclustr.com> wrote:
>
>> Hello Jai,
>>
>> Did you have a look at the following page: http://cassandra.apache.org/do
>> c/latest/operating/metrics.html
>>
>> In your case, you would want the following MBeans:
>> org.apache.cassandra.metrics:type=Table keyspace=
>> scope= name=
>> With MetricName set to ReadLatency and WriteLatency
>>
>> Cheers,
>>
>> Christophe
>>
>>
>>
>> On 1 September 2017 at 09:08, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I am looking to capture the CF level Read, Write count and Latency. As
>>> of now I am using Telegraf plugin to capture the JMX metrics.
>>>
>>> What is the MBeans, scope and metric to look for the CF level metrics?
>>>
>>>
>>
>>
>>
>


-- 


*Christophe Schmitz*
*Director of consulting EMEA*AU: +61 4 03751980 / FR: +33 7 82022899


<https://www.instaclustr.com>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra CF Level Metrics (Read, Write Count and Latency)

2017-08-31 Thread Christophe Schmitz
Hello Jai,

Did you have a look at the following page:
http://cassandra.apache.org/doc/latest/operating/metrics.html

In your case, you would want the following MBeans:
org.apache.cassandra.metrics:type=Table keyspace= scope=
name=
With MetricName set to ReadLatency and WriteLatency

Cheers,

Christophe


On 1 September 2017 at 09:08, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello All,
>
> I am looking to capture the CF level Read, Write count and Latency. As of
> now I am using Telegraf plugin to capture the JMX metrics.
>
> What is the MBeans, scope and metric to look for the CF level metrics?
>
>


Re: Getting all unique keys

2017-08-21 Thread Christophe Schmitz
Hi Avi,

The spark-project documentation is quite good, as well as the
spark-cassandra-connector github project, which contains some basic
examples you can easily get inspired from. A few random advice you might
find usefull:
- You will want one spark worker on each node, and a spark master on either
one of the node, or on a separate node.
- Pay close attention at your port configuration (firewall) as the spark
error log does not always give you the right hint.
- Pay close attention at your heap size. Make sure to configure your heap
size such as Cassandra heap size + spark heap size < your node memory (take
into account Cassandra off heap usage if enabled, OS etc...)
- If your Cassandra data center is used in production, make sure you
throttle read / write from Spark, pay attention to your latencies, and
consider using a separate analytic cassandra data center if you get serious
with Spark.
- More or less everyone I know find that writing spark jobs in scala is
natural, while writing them in java is painful :D

Getting spark running will be a bit of an investment at the beginning, but
overall you will find out it allows you to run queries you can't naturally
run in Cassandra, like the one you described.

Cheers,

Christophe

On 21 August 2017 at 16:16, Avi Levi  wrote:

> Thanks Christophe,
> we didn't want to add too many moving parts but is sound like a good
> solution. do you have any reference / link that I can look at ?
>
> Cheers
> Avi
>
> On Mon, Aug 21, 2017 at 3:43 AM, Christophe Schmitz <
> christo...@instaclustr.com> wrote:
>
>> Hi Avi,
>>
>> Have you thought of using Spark for that work? If you collocate the spark
>> workers on each Cassandra nodes, the spark-cassandra connector will split
>> automatically the token range for you in such a way that each spark worker
>> only hit the Cassandra local node. This will also be done in parallel.
>> Should be much faster that way.
>>
>> Cheers,
>> Christophe
>>
>>
>> On 21 August 2017 at 01:34, Avi Levi  wrote:
>>
>>> Thank you very much , one question . you wrote that I do not need
>>> distinct here since it's a part from the primary key. but only the
>>> combination is unique (*PRIMARY KEY (id, timestamp) ) .* also if I take
>>> the last token and feed it back as you showed wouldn't I get overlapping
>>> boundaries ?
>>>
>>> On Sun, Aug 20, 2017 at 6:18 PM, Eric Stevens  wrote:
>>>
>>>> You should be able to fairly efficiently iterate all the partition keys
>>>> like:
>>>>
>>>> select id, token(id) from table where token(id) >= -9204925292781066255
>>>> limit 1000;
>>>>  id | system.token(id)
>>>> +--
>>>> ...
>>>>  0xb90ea1db5c29f2f6d435426dccf77cca6320fac9 | -7821793584824523686
>>>>
>>>> Take the last token you receive and feed it back in, skipping
>>>> duplicates from the previous page (on the unlikely chance that you have two
>>>> ID's with a token collision on the page boundary):
>>>>
>>>> select id, token(id) from table where token(id) >=
>>>> -7821793584824523686 limit 1000;
>>>>  id | system.token(id)
>>>> +-
>>>> ...
>>>>  0xc6289d729c9087fb5a1fe624b0b883ab82a9bffe | -434806781044590339
>>>>
>>>> Continue until you have no more results.  You don't really need
>>>> distinct here: it's part of your primary key, it must already be distinct.
>>>>
>>>> If you want to parallelize it, split the ring into *n* ranges and
>>>> include it as an upper bound for each segment.
>>>>
>>>> select id, token(id) from table where token(id) >= -9204925292781066255
>>>> AND token(id) < $rangeUpperBound limit 1000;
>>>>
>>>>
>>>> On Sun, Aug 20, 2017 at 12:33 AM Avi Levi  wrote:
>>>>
>>>>> I need to get all unique keys (not the complete primary key, just the
>>>>> partition key) in order to aggregate all the relevant records of that key
>>>>> and apply some calculations on it.
>>>>>
>>>>> *CREATE TABLE my_table (
>>>>>
>>>>> id text,
>>>>>
>>>>> timestamp bigint,
>>>>>
>>>>> value double,
>>>>>
>>>>> PRI

Re: Getting all unique keys

2017-08-20 Thread Christophe Schmitz
Hi Avi,

Have you thought of using Spark for that work? If you collocate the spark
workers on each Cassandra nodes, the spark-cassandra connector will split
automatically the token range for you in such a way that each spark worker
only hit the Cassandra local node. This will also be done in parallel.
Should be much faster that way.

Cheers,
Christophe


On 21 August 2017 at 01:34, Avi Levi  wrote:

> Thank you very much , one question . you wrote that I do not need distinct
> here since it's a part from the primary key. but only the combination is
> unique (*PRIMARY KEY (id, timestamp) ) .* also if I take the last token
> and feed it back as you showed wouldn't I get overlapping boundaries ?
>
> On Sun, Aug 20, 2017 at 6:18 PM, Eric Stevens  wrote:
>
>> You should be able to fairly efficiently iterate all the partition keys
>> like:
>>
>> select id, token(id) from table where token(id) >= -9204925292781066255
>> limit 1000;
>>  id | system.token(id)
>> +--
>> ...
>>  0xb90ea1db5c29f2f6d435426dccf77cca6320fac9 | -7821793584824523686
>>
>> Take the last token you receive and feed it back in, skipping duplicates
>> from the previous page (on the unlikely chance that you have two ID's with
>> a token collision on the page boundary):
>>
>> select id, token(id) from table where token(id) >=
>> -7821793584824523686 limit 1000;
>>  id | system.token(id)
>> +-
>> ...
>>  0xc6289d729c9087fb5a1fe624b0b883ab82a9bffe | -434806781044590339
>>
>> Continue until you have no more results.  You don't really need distinct
>> here: it's part of your primary key, it must already be distinct.
>>
>> If you want to parallelize it, split the ring into *n* ranges and
>> include it as an upper bound for each segment.
>>
>> select id, token(id) from table where token(id) >= -9204925292781066255
>> AND token(id) < $rangeUpperBound limit 1000;
>>
>>
>> On Sun, Aug 20, 2017 at 12:33 AM Avi Levi  wrote:
>>
>>> I need to get all unique keys (not the complete primary key, just the
>>> partition key) in order to aggregate all the relevant records of that key
>>> and apply some calculations on it.
>>>
>>> *CREATE TABLE my_table (
>>>
>>> id text,
>>>
>>> timestamp bigint,
>>>
>>> value double,
>>>
>>> PRIMARY KEY (id, timestamp) )*
>>>
>>> I know that to query like this
>>>
>>> *SELECT DISTINCT id FROM my_table *
>>>
>>> is not very efficient but how about the approach presented here 
>>> <http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/>
>>>  sending queries in parallel and using the token
>>>
>>> *SELECT DISTINCT id FROM my_table WHERE token(id) >= -9204925292781066255 
>>> AND token(id) <= -9223372036854775808; *
>>>
>>> *or I can just maintain another table with the unique keys *
>>>
>>> *CREATE TABLE id_only ( id text,
>>>
>>> PRIMARY KEY (id) )*
>>>
>>> but I tend not to since it is error prone and will enforce other procedures 
>>> to maintain data integrity between those two tables .
>>>
>>> any ideas ?
>>>
>>> Thanks
>>>
>>> Avi
>>>
>>>
>


-- 


*Christophe Schmitz*
*Director of consulting EMEA*


Re: Cassandra Writes Duplicated/Concatenated List Data

2017-08-16 Thread Christophe Schmitz
Hi Nathan,


The code may occasionally write to the same row multiple times.
>
>
Can you run a test using IF NOT EXISTS in your inserts to see if that makes
a difference? That shouldn't make a difference, but I don't see what the
problem might be at the moment.


-- 


*Christophe Schmitz**Director of consulting EMEA*


Re: Large tombstones creation

2017-08-13 Thread Christophe Schmitz
Hi Vlad,

Are you by any chance inserting null values? If so you will create
tombstones. The work around (Cassandra >= 2.2) is to use unset on your
bound statement (see https://issues.apache.org/jira/browse/CASSANDRA-7304)

Cheers,

Christophe

On 13 August 2017 at 20:48, Vlad  wrote:

> Hi,
>
> I insert about 45000 rows to empty table in Python using prepared
> statements and IF NOT EXISTS. While reading after insert I get warnings like
> *Server warning: Read 5000 live rows and 33191 tombstone cells for query
> SELECT * FROM ...  LIMIT 5000 (see tombstone_warn_threshold)*
>
> How it can happen? I have several SASI indexes for this table, can this be
> a reason?
>
> Regards, Vlad
>



-- 


*Christophe Schmitz*
*Director of consulting EMEA*AU: +61 4 03751980 / FR: +33 7 82022899


<https://www.instaclustr.com>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


DateTieredCompactionStrategy DTCS sometimes stop dropping SSTables

2015-07-20 Thread Christophe Schmitz
Hi there,

I am running a 6 node cluster on 2.1.7 with a table using DTCS to store
time series data, for up to 12 hours (using ttl of 12h). Data are written
as they arrive, without any update or active delete.
During the first 12h, the cluster gets filled with data. And a bit later,
the amount of data stored remain stable. I naturally observe that the
oldest SSTables are not much older than 12h. However, at some point, a few
nodes started to get there data larger growint at the same rate as the
initial filling. Looking at the data dir,I observed that on those nodes,
SSTables that should have been dropped are still around (unlike on the
other nodes).
I dropped the keyspace, and started from scratch. Again, a node started to
store more data. I waited longer, and at some points, obsolete sstables got
dropped.
Now I don't understand why there is this behaviour, and I was hopping
someone could shade some light on this. Or maybe it is a bug? Anyone?


For info, the compactaion parameters I am using is:
compaction = {'min_threshold': '16', 'class':
'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
'max_sstable_age_days': '0.05', 'base_time_seconds': '300',
'max_threshold': '16'}
with a gc-grace-period of 0.


And attached is a plot on opscenter.

Thanks!

Christophe