[ 
https://issues.apache.org/jira/browse/CASSANDRA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KALYAN CHAKRAVARTHY KANCHARLA updated CASSANDRA-14927:
------------------------------------------------------
    Priority: Blocker  (was: Major)

> During data migration from 7 node to 21 node cluster using sstableloader, new 
> data is being populated on the new tables & data is being duplicated on user 
> type tables 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14927
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14927
>             Project: Cassandra
>          Issue Type: Test
>            Reporter: KALYAN CHAKRAVARTHY KANCHARLA
>            Priority: Blocker
>              Labels: test
>             Fix For: 2.1.13
>
>
> I'm trying to migrate data from 7 node (single DC) cluster to a 21 node (3 
> DC) cluster using sstableloader.
> We have same versions on both old and new clusters.
> *cqlsh 5.0.1* 
>  *Cassandra 2.1.13* 
>  *CQL spec 3.2.1* 
> Old and New clusters are in different networks. So we opened the following 
> ports between them.
> 7000- storage port
> 7001- ssl storage port
> 7199- JMX port
> 9042- client port
> 9160- Thrift client port
> We use vnodes in the clusters.
> We made sure cassandra.yaml file on the new cluster is set correct by 
> changing following options,
>  
> {{cluster_name: 'MyCassandraCluster' }}
> {{num_tokens: 256 }}
> {{seed_provider: - }}
> {{class_name: org.apache.cassandra.locator.SimpleSeedProvider }}
> {{parameters: - }}
> {{seeds: "10.168.66.41,10.176.170.59" }}
> {{listen_address: localhost}}
> {{endpoint_snitch: GossipingPropertyFileSnitch}}
> And also changes in cassaandra-rackdc-properties for each DC by specifying 
> respective DC and rack.
> while creating keyspaces, changed Replication to NetworkTopologyStratagy.
>  
> cluster looks healthy, all the node are UP and NORMAL. 
>  
> {color:#FF0000}*I was able to get the data from old cluster to new cluster. 
> But, along with the data from old cluster, I see some new rows being 
> populated in the tables on new cluster and data is being duplicated in the 
> tables with user type*. {color}
> {color:#333333}We have used the following steps to migrate data:{color}
>  # Took snapshorts for all the keyspaces that we want to migrate. (9 
> keyspaces). Used the _nodetool snapshot_ command on source nodes to take 
> snapshot of required keyspace/table by specifying _hostname, jmx port_ and 
> _keyspace_
>  __ 
> _/a/cassandra/bin/nodetool -u $(sudo su - company -c "cat 
> /a/cassandra/jmxremote.password" | awk '\{print $1}') -pw $(sudo su - company 
> -c "cat /a/cassandra/jmxremote.password" | awk '\{print $2}')_  *_-h 
> localhost -p 7199 snapshot keyspace_name_*
>  # After taking snapshots, move these snapshot directory from source nodes to 
> target node.
>        
> → Create a tar file on source node for the snapshot directory that we want to 
> move on to target node.
>      tar -cvf file.tar snapshot_name
> → Move this file.tar from source node to local machine.
>      scp -S gwsh root@192.168.64.99:/a/cassandra/data/file.tar .
> → Now move this file.tar from local machine to a new directory(example: test) 
> in the target node.
>     scp -S gwsh file.tar root@192.168.58.41:/a/cassandra/data/test/.
>  # Now untar this file.tar in test directory in target node.
>  # The path of the sstables must be same in both source and target.
>  # To bulk load these files using _sstableloader,run sstableloader on source 
> node, indicate one or more nodes in the destination Cluster with -d flag, 
> which can accept comma-separated list of IP addresses or hostnames, and 
> specify the path to  sstables in the source node._ __ 
> _/a/Cassandra/bin/_ *_./sstableloader -d host_IP path_to_sstables_*
>           *_Example:_*
> [/a/cassandra/bin#|mailto:root@sqa-cassandra03.sqaextranet:/a/cassandra/bin] 
> sstableloader -d 192.168.58.41 -u popps -pw ******* -tf 
> org.apache.cassandra.thrift.SSLTransportFactory -ts 
> /a/cassandra/ssl/truststore.jks -tspw test123 -ks 
> /a/cassandra/ssl/keystore.jks -kspw test123 -f 
> /a/cassandra/conf/cassandra.yaml 
> /a/cassandra/data/app_properties/_admins-58524140431511e8bbb6357f562e11ca_/ 
> Summary statistics:
>  Connections per host: : 1
>  Total files transferred: : 9
>  Total bytes transferred: : 1787893
>  Total duration (ms): : 2936
>  Average transfer rate (MB/s): : 0
>  Peak transfer rate (MB/s): : 0
>  
> Performed these steps on all the tables. And checked the row count in old and 
> new tables using CQLSH
> cqlsh> SELECT count(*) FROM keyspace.table;
> example for a single table:
> count on new table: 341
> count on old table: 303
>  
> And we are also able to identify the difference in tables by using 'sdiff' 
> command. Followed the following steps:
>  * created .txt/.csv files for tables in old and new clusters.
>  * compared them using sdiff command   
>  
> *So I request someone can help me to know the cause behind the population of 
> new data in the new tables.*  
> Please let me know if you need more info.
> PS: After migrating the data for the first time and saw these issues, we have 
> TRUNCATED all the tables and DROPPED tables with user 'type' and recreated  
> the dropped tables. And did the same procedure for migrating data again. 
> Still we see the same issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to