Apache Cassandra meetup @ Instagram HQ

2019-02-18 Thread dinesh.jo...@yahoo.com.INVALID
Hi all,

Apologies for the cross-post. In case you're in the SF Bay Area, Instagram is 
hosting a meetup. Interesting talks on Cassandra Traffic management, Cassandra 
on Kubernetes. See details in the attached link -
https://www.eventbrite.com/e/cassandra-traffic-management-at-instagram-cassandra-and-k8s-with-instaclustr-tickets-54986803008

Thanks,
Dinesh

Re: Maximum memory usage

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
Are you running any nodetool commands during that period? IIRC, this is a log 
entry emitted by the BufferPool. It may be harm unless it's happening very 
often or logging a OOM.
Dinesh 

On Wednesday, February 6, 2019, 6:19:42 AM PST, Rahul Reddy 
 wrote:  
 
 Hello,
I see maximum memory usage alerts in my system.log couple of times in a day as 
INFO. So far I haven't seen any issue with db. Why those messages are logged in 
system.log do we have any impact for reads/writes with those warnings? And what 
nerd to be looked
INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

Thanks in advance  

Re: Two datacenters with one cassandra node in each datacenter

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
You also want to use Cassandra with a minimum of 3 nodes.
Dinesh 

On Wednesday, February 6, 2019, 11:26:07 PM PST, dinesh.jo...@yahoo.com 
 wrote:  
 
 Hey Kunal,
Can you add more details about the size of data, read/write throughput, what 
are your latency expectations, etc? What do you mean by "performance" issue 
with replication? Without these details it's a bit tough to answer your 
questions.
Dinesh 

On Wednesday, February 6, 2019, 3:47:05 PM PST, Kunal 
 wrote:  
 
 HI All,
I need some recommendation on using two datacenters with one node in each 
datacenter. 
 
In our organization, We are trying to have two cassandra dataceters with only 1 
node on each side. From the preliminary investigation, I see replication is 
happening but I want to know if we can use this deployment in production? Will 
there be any performance issue with replication ?

We have already setup 2 datacenters with one node on each datacenter and 
replication is working fine. 

Can you please let me know if this kind of setup is recommended for production 
deployment. 
 Thanks in anticipation. 
 Regards,Kunal Vaid

Re: Two datacenters with one cassandra node in each datacenter

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
Hey Kunal,
Can you add more details about the size of data, read/write throughput, what 
are your latency expectations, etc? What do you mean by "performance" issue 
with replication? Without these details it's a bit tough to answer your 
questions.
Dinesh 

On Wednesday, February 6, 2019, 3:47:05 PM PST, Kunal 
 wrote:  
 
 HI All,
I need some recommendation on using two datacenters with one node in each 
datacenter. 
 
In our organization, We are trying to have two cassandra dataceters with only 1 
node on each side. From the preliminary investigation, I see replication is 
happening but I want to know if we can use this deployment in production? Will 
there be any performance issue with replication ?

We have already setup 2 datacenters with one node on each datacenter and 
replication is working fine. 

Can you please let me know if this kind of setup is recommended for production 
deployment. 
 Thanks in anticipation. 
 Regards,Kunal Vaid  

Re: Bootstrap keeps failing

2019-02-06 Thread dinesh.jo...@yahoo.com.INVALID
Would it be possible for you to take a thread dump & logs and share them?
Dinesh 

On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON 
 wrote:  
 
 Hello !
I am having a recurrent problem when trying to bootstrap a few new nodes.
Some general info :    
   - I am running cassandra 3.0.17
   - We have about 30 nodes in our cluster
   - All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra   

So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.
`nodetool status` says the node is still joining, 
When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :   
   - The node fills up to 100% disk space and crashes.
   - The bootstrap resume finishes with errors
When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.
This is the output I get from `nodetool resume` :

[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)

[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,887] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress: 2114%)

[2019-02-06 01:42:14,980] Stream failed

[2019-02-06 01:42:14,982] Error during bootstrap: Stream failed

[2019-02-06 01:42:14,982] Resume bootstrap complete

  The bootstrap `progress` goes way over 100% and eventually fails.

Right now I have a node with this output from `nodetool status` : `UJ  
10.16.XX.YYY  2.93 TB    256          ?                 
5788f061-a3c0-46af-b712-ebeecd397bf7  c`
It is almost filled with data, yet if I look at `nodetool netstats` :
        Receiving 480 files, 325.39 GB total. Already received 5 files, 68.32 
MB total
        Receiving 499 files, 328.96 GB total. Already received 1 files, 1.32 GB 
total
        Receiving 506 files, 345.33 GB total. Already received 6 files, 24.19 
MB total
        Receiving 362 files, 206.73 GB total. Already received 7 files, 34 MB 
total
        Receiving 424 files, 281.25 GB total. Already received 1 files, 1.3 GB 
total
        Receiving 581 files, 349.26 GB total. Already received 8 files, 45.96 
MB total
        Receiving 443 files, 337.26 GB total. Already received 6 files, 96.15 
MB total
        Receiving 424 files, 275.23 GB total. Already received 5 files, 42.67 
MB total

It is trying to pull all the data again.
Am I missing something about the way `nodetool bootstrap resume` is supposed to 
be used ?
Regards,
Leo
  

Re: Modeling Time Series data

2019-01-11 Thread dinesh.jo...@yahoo.com.INVALID
Hi Akash,
There are a lot of interesting articles written around this topic.
   
   - 
http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html
   

   - 
https://medium.com/netflix-techblog/scaling-time-series-data-storage-part-i-ec2b6d44ba39
   


You shouldn't need to worry about hotspots if you select the partition key 
carefully and your cluster is configured properly. Please go through the links 
and if you have more clarification, please feel free to ask more questions here.
Thanks,
Dinesh 

On Friday, January 11, 2019, 2:45:42 PM PST, Akash Gangil 
 wrote:  
 
 Hi, 

I have a data model where the partition key for a lot of tables is based on 
time 
(year, month, day, hour)
Would this create a hotspot in my cluster, given all the writes/reads would go 
to the same node for a given hour? Or does the cassandra storage engine also 
takes into account the table info like table name, when distributing the data?
If the above model would be a problem, what's the suggested way to solve this? 
Add tablename to partition key?

-- 
Akash
  

Re: Cassandra lucene secondary indexes

2018-12-13 Thread dinesh.jo...@yahoo.com.INVALID
Providing logs or more technical information might be helpful. If it is 
cassandra-lucene related issue, perhaps it'll be better to open a issue in 
their github repo?
Dinesh 

On Wednesday, December 12, 2018, 11:17:06 PM GMT+5:30, Brian Spindler 
 wrote:  
 
 Hi all, we recently started using the cassandra-lucene secondary index support 
that Instaclustr recently assumed ownership of, thank you btw!  
We are experiencing a strange issue where adding/removing nodes fails and the 
joining node is left hung with a compaction "Secondary index build" and it just 
never completes.  
We're running v3.11.3 of Cassandra and the plugin, has anyone experienced this 
before?  
It's a relatively small cluster ~6 nodes in our user acceptance environment and 
so not a lot of load either.  
Thanks! 
-- 
-Brian  

Re: Migrating from DSE5.1.2 to Opensource cassandra

2018-12-04 Thread dinesh.jo...@yahoo.com.INVALID
Thanks, nice summary of the overall process.
Dinesh 

On Tuesday, December 4, 2018, 9:38:47 PM EST, Jonathan Koppenhofer 
 wrote:  
 
 Unfortunately, we found this to be a little tricky. We did migrations from DSE 
4.8 and 5.0 to OSS 3.0.x, so you may run into additional issues. I will also 
say your best option may be to install a fresh cluster and stream the data. 
This wasn't feasible for us at the size and scale in the time frames and 
infrastructure restrictions we had. I will have to review my notes for more 
detail, but off the top of my head, for an in place migration...
Pre-upgrade* Be sure you are not using any Enterprise features like Search or 
Graph. Not only are there not equivalent features in open source, but theses 
features require proprietary classes to be in the classpath, or Cassandra will 
not even start up.* By default, I think DSE uses their own custom 
authenticators, authorizors, and such. Make sure what you are doing has an open 
source equivalent.* The DSE system keyapaces use custom replication strategies. 
Convert these to NTS before upgrade.* Otherwise, follow the same processes you 
would do before an upgrade (repair, snapshot, etc)
Upgrade* The easy part is just replacing the binaries as you would in normal 
upgrade. Drain and stop the existing node first. You can also do this same 
process in a rolling fashion to maintain availability. In our case, we were 
doing an in-place upgrade and reusing the same IPs* DSE unfortunately creates a 
custom column in a system table that requires you to remove one (or more) 
system tables (peers?) to be able to start the node. You delete these system 
tables by  removing the sstbles on disk while the node is down. This is a bit 
of a headache if using vnodes. As we are using vnodes, it required us to 
manually specify num tokens, and the specific tokens the node was responsible 
for in Cassandra.yaml. You have to do this before you start the node. If not 
using vnodes, this is simpler, but we used vnodes. Again, I'll double check my 
notes. Once the node is up, you can revert to your normal vnodes/num tokens 
settings.
Post upgrade:* Drop DSE system tables.
I'll revert with more detail if needed.

On Tue, Dec 4, 2018, 5:46 PM Nandakishore Tokala 

Re: request_scheduler functionalities for CQL Native Transport

2018-11-28 Thread dinesh.jo...@yahoo.com.INVALID
I think what you're looking for might be solved by CASSANDRA-8303. However, I 
am not sure if anybody is working on it. Generally you want to create different 
clusters for users to physically isolate them. What you propose has been 
discussed in the past and it is something that is currently unsupported.
Dinesh 

On Tuesday, November 27, 2018, 11:05:32 PM PST, Shaurya Gupta 
 wrote:  
 
 Hi,
We want to throttle maximum queries on any keyspace for clients connecting via 
CQL native transport. This option is available for clients connecting via 
thrift by property of request_scheduler in cassandra.yaml.Is there some option 
available for clients connecting via CQL native transport.If not is there any 
plan to do so in future.It is a must have feature if we want to support 
multiple teams on a single cassandra cluster or to prevent one keyspace from 
interfering with the performance of the other keyspaces.
RegardsShaurya Gupta


  

Re: nodetool rebuild

2018-09-15 Thread dinesh.jo...@yahoo.com.INVALID
Its a long shot but do you have stream_throughput_outbound_megabits_per_sec or 
inter_dc_stream_throughput_outbound_megabits_per_sec set to a low value?
You're right in that 3.0 streaming uses 1 thread for incoming and outgoing 
connection each per peer. It not only reads the bytes off of the channel but 
also deserializes the partitions on that same thread. If you see high CPU use 
by STREAM-IN thread then your streaming is CPU bound. In this situation a 
powerful CPU will definitely help. Dropping internode compression and 
encryption will also help. Are your SSTables compressed?
Dinesh 

On Friday, September 14, 2018, 4:15:28 AM PDT, Vitali Dyachuk 
 wrote:  
 
 None of these throttling are helpful for streaming if you have even a 150-200 
Mbit/s bandwidth which is affordable in any cloud. Tweaking network tcp memory, 
window size etc does not help, the bottleneck is not the network.
These are my findings on how streaming is limited in C* 3.0.*

1)  Streaming of the particular range which needs to be steamed to the new node 
is limited with one 1 thread and no tweaking of cpu affinity etc helps, 
probably the powerfull computing VM will help
2) Disabling compression internode_compression and disabling compression per 
table in our case helps a bit
3) When streaming has been dropped there is no resume available for the 
streaming range so it will start from the beginning 

One of the options could be to create snapshots of sstables on the source node 
and just copy all sstable snapshots to new node and then run repair, data is 
~5TB, RF3 ?
How is it possible at all to stream data fast to a new node/nodes ? 

Vitali.
On Wed, Sep 12, 2018 at 5:02 PM Surbhi Gupta  wrote:

Increase 3 throughput Compaction throughput Stream throughput Interdcstream 
throughput (if rebuilding from another DC)
Make all of the above to 0 and see if there is any improvement and later set 
the value if u can’t leave these values to 0.
On Wed, Sep 12, 2018 at 5:42 AM Vitali Dyachuk  wrote:

Hi,
I'm currently streaming data with nodetool rebuild on 2 nodes, each node is 
streaming from different location. The problem is that it takes ~7 days to 
stream 4Tb of data to 1 node, the speed on each side is ~150Mbit/s  so it 
should take around 
~2,5 days . Although there are resources on the destnodes and in the source 
regions.
I've increased stream throughput, but its only affects outbound connections.  
Tested with iperf the bandwidth is 600Mibt/s from both sides. Last week i've 
changed the CS from ST to LC because of huge sstables and compaction of them is 
still ongoing.
How does rebuild command works ? Does it calculate the range then request the 
needed sstables from that node and start streaming ? How is it possible to 
speed up the streaming ?
Vitali.

  

Re: AxonOps

2018-09-15 Thread dinesh.jo...@yahoo.com.INVALID
Are you planning to open source it or just a binary distribution? 
Dinesh 

On Saturday, September 15, 2018, 3:16:47 PM PDT, Hayato Shimizu 
 wrote:  
 
 Hi Cassandra folks,
We built a Cassandra management tool for ourselves, but decided that we'd like 
to share it with you and it will soon be released to the public for free. We 
called it AxonOps and it provides GUI metrics/logs dashboards, service health 
checks, Cassandra adaptive repair, backup/restore, and currently integrates 
with PagerDuty, Slack, Hipchat (R.I.P.) and email on alerts and notifications.
You can read about it on our blog here: https://digitalis.io/blog/axonops/
We're currently working hard on documentations etc before making it available 
for you to download.
We'd be interested to learn if anybody would like to use such a tool.
Hayato



Professional Services & Fully Managed Technologies - on-premise, all major 
clouds and hybrid




--
Any views or opinions presented are solely those of the author and do not 
necessarily represent those of the company. digitalis.io is a trading name of 
Digitalis.io Ltd. Company Number: 98499457 Registered in England and Wales. 
Registered Office: Kemp House, 152 City Road, London, EC1V 2NX, United Kingddom
  

Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread dinesh.jo...@yahoo.com.INVALID
With LCS, 6696 you can maximize the percentage of SSTables that use the new 
streaming path. With LCS and relatively small SSTables you should see good 
gains. Bootstrap is a use-case that should see the maximum benefits. This 
feature will get better with time.
Dinesh 

On Wednesday, August 29, 2018, 12:34:32 AM PDT, kurt greaves 
 wrote:  
 
 My reasoning was if you have a small cluster with vnodes you're more likely to 
have enough overlap between nodes that whole SSTables will be streamed on major 
ops. As  N gets >RF you'll have less common ranges and thus less likely to be 
streaming complete SSTables. Correct me if I've misunderstood.
On 28 August 2018 at 01:37, Dinesh Joshi  wrote:

Although the extent of benefits depend on the specific use case, the cluster 
size is definitely not a limiting factor.

Dinesh
On Aug 27, 2018, at 5:05 AM, kurt greaves  wrote:


I believe there are caveats that it will only really help if you're not using 
vnodes, or you have a very small cluster, and also internode encryption is not 
enabled. Alternatively if you're using JBOD vnodes will be marginally better, 
but JBOD is not a great idea (and doesn't guarantee a massive improvement).
On 27 August 2018 at 15:46, dinesh.jo...@yahoo.com.INVALID 
 wrote:

Yes, this feature will help with operating nodes with higher data density.
Dinesh 

On Saturday, August 25, 2018, 9:01:27 PM PDT, onmstester onmstester 
 wrote:  
 
 I've noticed this new feature of 4.0:
Streaming optimizations (https://cassandra.apache.org/ 
blog/2018/08/07/faster_streami ng_in_cassandra.html)
Is this mean that we could have much more data density with Cassandra 4.0 (less 
problems than 3.X)? I mean > 10 TB of data on each node without worrying about 
node join/remove?
This is something needed for Write-Heavy applications that do not read a lot. 
When you have like 2 TB of data per day and need to keep it for 6 month, it 
would be waste of money to purchase 180 servers (even Commodity or Cloud). 
IMHO, even if 4.0 fix problem with streaming/joining a new node, still 
Compaction is another evil for a big node, but we could tolerate that somehow


Sent using Zoho Mail



  




  

Re: bigger data density with Cassandra 4.0?

2018-08-26 Thread dinesh.jo...@yahoo.com.INVALID
Yes, this feature will help with operating nodes with higher data density.
Dinesh 

On Saturday, August 25, 2018, 9:01:27 PM PDT, onmstester onmstester 
 wrote:  
 
 I've noticed this new feature of 4.0:
Streaming optimizations 
(https://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html)
Is this mean that we could have much more data density with Cassandra 4.0 (less 
problems than 3.X)? I mean > 10 TB of data on each node without worrying about 
node join/remove?
This is something needed for Write-Heavy applications that do not read a lot. 
When you have like 2 TB of data per day and need to keep it for 6 month, it 
would be waste of money to purchase 180 servers (even Commodity or Cloud). 
IMHO, even if 4.0 fix problem with streaming/joining a new node, still 
Compaction is another evil for a big node, but we could tolerate that somehow


Sent using Zoho Mail



  

Re: benefits oh HBase over Cassandra

2018-08-24 Thread dinesh.jo...@yahoo.com.INVALID
I've worked with both databases. They're suitable for different use-cases. If 
you look at the CAP theorem; HBase is CP while Cassandra is a AP. If we talk 
about a specific use-case, it'll be easier to discuss.
Dinesh 

On Friday, August 24, 2018, 1:56:31 PM PDT, Vitaliy Semochkin 
 wrote:  
 
 Hi,

I read that once Facebook chose HBase over Cassandra for it's messenger,
but I never found what are the benefits for HBase over Cassandra,
can someone list, if there are any?

Regards,
Vitaliy

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

  

Re: duplicate rows for partition

2018-08-22 Thread dinesh.jo...@yahoo.com.INVALID
What is the schema of the table? Could your include the output of DESCRIBE?
Dinesh 

On Wednesday, August 22, 2018, 2:22:31 PM PDT, Gosar M 
 wrote:  
 
 Hello,
Have a table with following partition and clustering keys
partition key - ("userid", "secondaryid"), 
clustering key - "tDate", "tid3", "sid4", "pid5"
Data is inserted based on above partition and clustering key. For 1 record 
seeing 2 rows returned when queried by both partition and clustering key.

  userid  | secondaryid  | tdate   | tid3  | sid4 | 
pid5    | associate_degree
 
--+-+
 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694989

We did not had any node which was down longer than gc_grace_period. 


Thank you. 
  

Re: Huge daily outbound network traffic

2018-08-16 Thread dinesh.jo...@yahoo.com.INVALID
You could also run tcpdump to inspect the streams.
Dinesh 

On Thursday, August 16, 2018, 1:11:47 PM PDT, Elliott Sims 
 wrote:  
 
 Since this is cross-node traffic, "nodetool netstats" during the high-traffic 
period should give you a better idea of what's being sent.

On Thu, Aug 16, 2018 at 2:34 AM, Behnam B.Marandi  
wrote:

In case of cronjobs, there is no jobs for that time period and I can see affect 
of jobs like backups and repairs but traffic that they cause is not comparable. 
Like 800MB comparing to 2GB. And for this case it is all outbound network on 
all 3 cluster nodes.

On Thu, Aug 16, 2018 at 5:16 PM dinesh.jo...@yahoo.com.INVALID 
 wrote:

Since it is predictable, can you check the logs during that period? What do 
they say? Do you have a cron running on those hosts? Do all the nodes 
experience this issue?
Dinesh 

On Thursday, August 16, 2018, 12:02:55 AM PDT, Behnam B.Marandi 
 wrote:  
 
 Actually I did. It seems this is a cross node traffic from one node to port 
7000 (storage_port) of the other node.

On Sun, Aug 12, 2018 at 2:44 PM Elliott Sims  wrote:

Since it's at a consistent time, maybe just look at it with iftop to see where 
the traffic's going and what port it's coming from?  

On Fri, Aug 10, 2018 at 1:48 AM, Behnam B.Marandi  
wrote:

I don't have any external process or planed repair in that time period.In case 
of network, I can see outbound network on Cassandra node network interface but 
couldn't find any way to check the VPC network to make sure it is not going out 
of network. Maybe the only way is analysing VPC Flow Log.B.

On Tue, Aug 7, 2018 at 11:23 PM, Rahul Singh  
wrote:

Are you sure you don’t have an outside process that is doing an export , Spark 
job, non AWS managed backup process ?

Is this network out from Cassandra or from the network?


RahulOn Aug 7, 2018, 4:09 AM -0400, Behnam B.Marandi , wrote:

Hi,I have a 3 node Cassandra cluster (version 3.11.1) on m4.xlarge EC2 
instances with separate EBS volumes for root (gp2), data (gp2) and commitlog 
(io1).I get daily outbound traffic at a certain time everyday. As you can see 
in the attached screenshot, whiile my normal networkl oad hardly meets 200MB, 
this outbound (orange) spikes up to 2GB while inbound (purple) is less than 
800MB.There is no repair or backup process giong on in that time window, so I 
am wondering where to look. Any idea?

-- -- -
To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org






  


  

Re: Huge daily outbound network traffic

2018-08-16 Thread dinesh.jo...@yahoo.com.INVALID
Since it is predictable, can you check the logs during that period? What do 
they say? Do you have a cron running on those hosts? Do all the nodes 
experience this issue?
Dinesh 

On Thursday, August 16, 2018, 12:02:55 AM PDT, Behnam B.Marandi 
 wrote:  
 
 Actually I did. It seems this is a cross node traffic from one node to port 
7000 (storage_port) of the other node.

On Sun, Aug 12, 2018 at 2:44 PM Elliott Sims  wrote:

Since it's at a consistent time, maybe just look at it with iftop to see where 
the traffic's going and what port it's coming from?  

On Fri, Aug 10, 2018 at 1:48 AM, Behnam B.Marandi  
wrote:

I don't have any external process or planed repair in that time period.In case 
of network, I can see outbound network on Cassandra node network interface but 
couldn't find any way to check the VPC network to make sure it is not going out 
of network. Maybe the only way is analysing VPC Flow Log.B.

On Tue, Aug 7, 2018 at 11:23 PM, Rahul Singh  
wrote:

Are you sure you don’t have an outside process that is doing an export , Spark 
job, non AWS managed backup process ?

Is this network out from Cassandra or from the network?


RahulOn Aug 7, 2018, 4:09 AM -0400, Behnam B.Marandi , wrote:

Hi,I have a 3 node Cassandra cluster (version 3.11.1) on m4.xlarge EC2 
instances with separate EBS volumes for root (gp2), data (gp2) and commitlog 
(io1).I get daily outbound traffic at a certain time everyday. As you can see 
in the attached screenshot, whiile my normal networkl oad hardly meets 200MB, 
this outbound (orange) spikes up to 2GB while inbound (purple) is less than 
800MB.There is no repair or backup process giong on in that time window, so I 
am wondering where to look. Any idea?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org






  

Re: Reading cardinality from Statistics.db failed

2018-08-16 Thread dinesh.jo...@yahoo.com.INVALID
Vitali, 
It doesn't look like there is an existing Jira. It would be helpful if you 
could create one with as much information as possible. Can you reduce this 
issue to a short, repeatable set of steps that we can reproduce? That'll be 
helpful to debug this problem.
Dinesh 

On Wednesday, August 15, 2018, 1:07:21 AM PDT, Vitali Dyachuk 
 wrote:  
 
 I've upgraded to 3.0.17 and the issue is still there, Is there a jira ticket 
for that bug  or should i create one?
On Wed, Jul 25, 2018 at 2:57 PM Vitali Dyachuk  wrote:

I'm using 3.0.15. I see that there is some fix for sstable metadata in 3.0.16 
https://issues.apache.org/jira/browse/CASSANDRA-14217 - is that a fix for 
"reading cardinalyti from statistics.db" ?


On Wed, Jul 25, 2018 at 1:02 PM Hannu Kröger  wrote:

What version of Cassandra are you running? There is a bug in 3.10.0 and certain 
3.0.x that occurs in certain conditions and corrupts that file. 
Hannu
Vitali Dyachuk  kirjoitti 25.7.2018 kello 10.48:


Hi,
I have noticed in the cassandra system.log that there is some issue with 
sstable metadata, the messages says:
WARN  [Thread-6] 2018-07-25 07:12:47,928 SSTableReader.java:249 - Reading 
cardinality from Statistics.db failed for 
/opt/data/disk5/data/keyspace/table/mc-big-Data.db
Although there is no such file. The message has appeared after i've changed the 
compaction strategy from SizeTiered to Leveled.
Currently i'm running nodetool scrub to rebuilt the sstable, and it takes a lot 
of time to scrub all sstables.
Reading the code it is said that if this metada is broken, then estimating the 
keys will be done using index summary. How expensive it is ?
https://github.com/apache/cassandra/blob/cassandra-3.0.15/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L245

The main question is why has this happened?

Thanks,
Vitali Djatsuk.


  

Re: Thrift to CQL migration under new Keyspace or Cluster

2018-06-25 Thread dinesh.jo...@yahoo.com.INVALID
If you're working in a different keyspace, I don't anticipate any issues. Have 
you attempted one in a test cluster? :)
Dinesh 

On Friday, June 22, 2018, 1:26:56 AM PDT, Fernando Neves 
 wrote:  
 
 Hi guys,We are running one of our Cassandra cluster under 2.0.17 Thrift 
version and we started the 2.0.17 CQL migration plan through 
CQLSSTableWriter/sstableloader method.
Simple question, maybe someone worked in similar scenario, is there any problem 
to do the migration under the same Cassandra instances (nodes) but in different 
keyspace (ks_thrift to ks_cql) or should we create another 2.0.17 cluster to do 
this work?I know that new keyspace will require more host resources but it will 
be more simple for us, because once the table migrated we will drop it on the 
old ks_thrift keyspace.
Thanks,Fernando.  

Re: RE: Mongo DB vs Cassandra

2018-06-04 Thread dinesh.jo...@yahoo.com.INVALID
If you have the time, I would suggest creating a prototype with both databases 
and trying it out. You should also have some idea of how this system might 
evolve in the future. It is important because that could very well help you 
make a decision. Mongo or Cassandra may work but if your requirements evolve in 
a way that works better with Cassandra, you might be better off going with 
Cassandra. 
As others have pointed out each database has it's own strength. Given that you 
may store a 20KB to 600MB row, you may be able to model it with Mongo as well 
as Cassandra. If you plan on having a separate index like ElasticSearch, Solr 
that is outside the database, I would suggest going with Cassandra.
Other factors to consider are licensing, operational cost, etc. 
Dinesh 

On Thursday, May 31, 2018, 9:01:09 AM PDT, Sudhakar Ganesan 
 wrote:  
 
 #yiv691278 #yiv691278 -- _filtered #yiv691278 {panose-1:2 4 5 3 5 
4 6 3 2 4;} _filtered #yiv691278 {font-family:Calibri;panose-1:2 15 5 2 2 2 
4 3 2 4;} _filtered #yiv691278 {font-family:Candara;panose-1:2 14 5 2 3 3 3 
2 2 4;}#yiv691278 #yiv691278 p.yiv691278MsoNormal, #yiv691278 
li.yiv691278MsoNormal, #yiv691278 div.yiv691278MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;color:black;}#yiv691278
 a:link, #yiv691278 span.yiv691278MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv691278 a:visited, 
#yiv691278 span.yiv691278MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv691278 
p.yiv691278msonormal0, #yiv691278 li.yiv691278msonormal0, 
#yiv691278 div.yiv691278msonormal0 
{margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;color:black;}#yiv691278
 span.yiv691278EmailStyle18 
{font-family:sans-serif;color:windowtext;}#yiv691278 
span.yiv691278EmailStyle19 
{font-family:sans-serif;color:windowtext;}#yiv691278 
.yiv691278MsoChpDefault {font-size:10.0pt;} _filtered #yiv691278 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv691278 div.yiv691278WordSection1 
{}#yiv691278 
At high level, in the production line, machine will provide the data in the 
form of CSV in every 1 sec to 1 minutes to 1 day ( depending on machine type 
used in the line operations). I need to parse those files and load it to DB and 
build and API layer expose it to downstream systems.
 
  
 
Number of files to be processed   13,889,660,134  per day
 
Each file could range from 20 KB to 600MB which will translate into few hundred 
rows to millions of rows.
 
High availability with high write. Read is less compare to write.
 
While extracting the rows, few validation to be performed.
 
Build an API layer on top of the data to be persisted in the DB.
 
  
 
Now, tell me what would be the best choice…
 
  
 
From: Russell Bateman [mailto:r...@windofkeltia.com]
Sent: Thursday, May 31, 2018 7:36 PM
To: user@cassandra.apache.org
Subject: Re: Mongo DB vs Cassandra
 
  
 
Sudhakar,

MongoDB will accommodate loading CSV without regard to schema while still 
creating identifiable "columns" in the database, but you'll have to predict or 
back-impose some schema later if you're going to create indices for fast 
searching of the data. You can perform searching of data without indexing in 
MongoDB, but it's slower.

Cassandra will require you to understand the schema, i.e.: what the columns are 
up front unless you're just going to store the data without schema and, 
therefore, without ability to search effectively.

As suggested already, you should share more detail if you want good advice. 
Both DBs are excellent. Both do different things in different ways.

Hope this helps,
Russ
 
On 05/31/2018 05:49 AM, Sudhakar Ganesan wrote:
 

Team,
 
 
 
I need to make a decision on Mongo DB vs Cassandra for loading the csv file 
data and store csv file as well. If any of you did such study in last couple of 
months, please share your analysis or observations.
 
 
 
Regards,
 
Sudhakar
 
Legal Disclaimer :
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed
or by their designee. If the reader of this message is not the intended 
recipient,
you are on notice that any distribution of this message, in any form, 
is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!
 

  
   

Re: Cassandra doesn't insert all rows

2018-04-21 Thread dinesh.jo...@yahoo.com.INVALID
Soheil, 
As Jeff mentioned that you need to provide more information. There are no known 
issues that I can think of that would cause such behavior. It would be great if 
you could provide us with a reduced test case so we can try and reproduce this 
behavior or at least help you debug the issue better. Could you detail the 
version of Cassandra, the number of nodes, the keyspace definition, RF / CL, 
perhaps a bit of your client code that does the writes, did you get back any 
errors on the client or on the server side? These details would be helpful to 
further help you.
Thanks,
Dinesh 

On Saturday, April 21, 2018, 11:06:12 AM PDT, Soheil Pourbafrani 
 wrote:  
 
 I consume data from Kafka and insert them into Cassandra cluster using Java 
API. The table has 4 keys including a timestamp based on millisecond. But when 
executing the code, it just inserts 120 to 190 rows and ignores other incoming 
data!
What parts can be the cause of the problem? Bad insert code in key fields that 
overwrite data, improper cluster configuration,?