[ https://issues.apache.org/jira/browse/CASSANDRA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
KALYAN CHAKRAVARTHY KANCHARLA updated CASSANDRA-14927: ------------------------------------------------------ Priority: Blocker (was: Major) > During data migration from 7 node to 21 node cluster using sstableloader, new > data is being populated on the new tables & data is being duplicated on user > type tables > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-14927 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14927 > Project: Cassandra > Issue Type: Test > Reporter: KALYAN CHAKRAVARTHY KANCHARLA > Priority: Blocker > Labels: test > Fix For: 2.1.13 > > > I'm trying to migrate data from 7 node (single DC) cluster to a 21 node (3 > DC) cluster using sstableloader. > We have same versions on both old and new clusters. > *cqlsh 5.0.1* > *Cassandra 2.1.13* > *CQL spec 3.2.1* > Old and New clusters are in different networks. So we opened the following > ports between them. > 7000- storage port > 7001- ssl storage port > 7199- JMX port > 9042- client port > 9160- Thrift client port > We use vnodes in the clusters. > We made sure cassandra.yaml file on the new cluster is set correct by > changing following options, > > {{cluster_name: 'MyCassandraCluster' }} > {{num_tokens: 256 }} > {{seed_provider: - }} > {{class_name: org.apache.cassandra.locator.SimpleSeedProvider }} > {{parameters: - }} > {{seeds: "10.168.66.41,10.176.170.59" }} > {{listen_address: localhost}} > {{endpoint_snitch: GossipingPropertyFileSnitch}} > And also changes in cassaandra-rackdc-properties for each DC by specifying > respective DC and rack. > while creating keyspaces, changed Replication to NetworkTopologyStratagy. > > cluster looks healthy, all the node are UP and NORMAL. > > {color:#FF0000}*I was able to get the data from old cluster to new cluster. > But, along with the data from old cluster, I see some new rows being > populated in the tables on new cluster and data is being duplicated in the > tables with user type*. {color} > {color:#333333}We have used the following steps to migrate data:{color} > # Took snapshorts for all the keyspaces that we want to migrate. (9 > keyspaces). Used the _nodetool snapshot_ command on source nodes to take > snapshot of required keyspace/table by specifying _hostname, jmx port_ and > _keyspace_ > __ > _/a/cassandra/bin/nodetool -u $(sudo su - company -c "cat > /a/cassandra/jmxremote.password" | awk '\{print $1}') -pw $(sudo su - company > -c "cat /a/cassandra/jmxremote.password" | awk '\{print $2}')_ *_-h > localhost -p 7199 snapshot keyspace_name_* > # After taking snapshots, move these snapshot directory from source nodes to > target node. > > → Create a tar file on source node for the snapshot directory that we want to > move on to target node. > tar -cvf file.tar snapshot_name > → Move this file.tar from source node to local machine. > scp -S gwsh root@192.168.64.99:/a/cassandra/data/file.tar . > → Now move this file.tar from local machine to a new directory(example: test) > in the target node. > scp -S gwsh file.tar root@192.168.58.41:/a/cassandra/data/test/. > # Now untar this file.tar in test directory in target node. > # The path of the sstables must be same in both source and target. > # To bulk load these files using _sstableloader,run sstableloader on source > node, indicate one or more nodes in the destination Cluster with -d flag, > which can accept comma-separated list of IP addresses or hostnames, and > specify the path to sstables in the source node._ __ > _/a/Cassandra/bin/_ *_./sstableloader -d host_IP path_to_sstables_* > *_Example:_* > [/a/cassandra/bin#|mailto:root@sqa-cassandra03.sqaextranet:/a/cassandra/bin] > sstableloader -d 192.168.58.41 -u popps -pw ******* -tf > org.apache.cassandra.thrift.SSLTransportFactory -ts > /a/cassandra/ssl/truststore.jks -tspw test123 -ks > /a/cassandra/ssl/keystore.jks -kspw test123 -f > /a/cassandra/conf/cassandra.yaml > /a/cassandra/data/app_properties/_admins-58524140431511e8bbb6357f562e11ca_/ > Summary statistics: > Connections per host: : 1 > Total files transferred: : 9 > Total bytes transferred: : 1787893 > Total duration (ms): : 2936 > Average transfer rate (MB/s): : 0 > Peak transfer rate (MB/s): : 0 > > Performed these steps on all the tables. And checked the row count in old and > new tables using CQLSH > cqlsh> SELECT count(*) FROM keyspace.table; > example for a single table: > count on new table: 341 > count on old table: 303 > > And we are also able to identify the difference in tables by using 'sdiff' > command. Followed the following steps: > * created .txt/.csv files for tables in old and new clusters. > * compared them using sdiff command > > *So I request someone can help me to know the cause behind the population of > new data in the new tables.* > Please let me know if you need more info. > PS: After migrating the data for the first time and saw these issues, we have > TRUNCATED all the tables and DROPPED tables with user 'type' and recreated > the dropped tables. And did the same procedure for migrating data again. > Still we see the same issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org