[ https://issues.apache.org/jira/browse/USERGRID-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617979#comment-14617979 ]
Michael Russo edited comment on USERGRID-785 at 7/8/15 5:20 AM: ---------------------------------------------------------------- The root cause of this issue occurred before running the migration, when data was loaded into Usergrid via 2.0. The cause of the issue is due to a large number of tombstones existing in the Graph_Source_Node_Edges column family. These tombstones were created because of the internal shard impl in Usergrid. Entities were moved in Cassandra, causing a lot of deletes (more than 100k which is the default tombstone_failure_threshold in Cassandra. Example log statements showing the entities were moved: {code} 2015-07-01 18:28:28,954 [graphTaskExecutor-1] INFO org.apache.usergrid.persistence.graph.serialization.impl.shard.impl.ShardGroupCompactionImpl- Finished compacting [Shard{shardIndex=1435770644766002, createdTime=1435771087041, compacted=true}] shards and moved 74009 edges {code} {code} 2015-07-01 18:28:28,755 [graphTaskExecutor-1] INFO org.apache.usergrid.persistence.graph.serialization.impl.shard.impl.ShardGroupCompactionImpl- Finished compacting [Shard{shardIndex=1435770644766002, createdTime=1435771087041, compacted=true}] shards and moved 58072 edges {code} To move past this, the workaround was to update the Graph_Source_Node_Edges column family in Cassandra to use SizeTieredCompactionStrategy and set the gc_grace to something low like 60 secs: {code} update column family Graph_Source_Node_Edges with gc_grace=60 and compaction_strategy='SizeTieredCompactionStrategy' and compaction_strategy_options={min_sstable_size:50}; {code} After this update and the gc_grace period has passed, the following command was ran on the Cassandra nodes to manually force a compaction which removed the tombstones: {code} nodetool compact <yourKeyspaceName> Graph_Source_Node_Edges {code} Once the compaction is complete, revert back to LeveledCompactionStrategy with the default gc_grace of 864000 secs (10 days): {code} update column family Graph_Source_Node_Edges with compaction_strategy='LeveledCompactionStrategy' and gc_grace=864000 and compaction_strategy_options={sstable_size_in_mb:512}; {code} was (Author: mrusso): The root cause of this issue occurred before running the migration, when data was loaded into Usergrid via 2.0. The cause of the issue is due to a large number of tombstones existing in the Graph_Source_Node_Edges column family. These tombstones were created because of the internal shard impl in Usergrid. Entities were moved in Cassandra, causing a lot of deletes (more than 100k which is the default tombstone_failure_threshold in Cassandra. Example log statements showing the entities were moved: {code} 2015-07-01 18:28:28,954 [graphTaskExecutor-1] INFO org.apache.usergrid.persistence.graph.serialization.impl.shard.impl.ShardGroupCompactionImpl- Finished compacting [Shard{shardIndex=1435770644766002, createdTime=1435771087041, compacted=true}] shards and moved 74009 edges {code} {code} 2015-07-01 18:28:28,755 [graphTaskExecutor-1] INFO org.apache.usergrid.persistence.graph.serialization.impl.shard.impl.ShardGroupCompactionImpl- Finished compacting [Shard{shardIndex=1435770644766002, createdTime=1435771087041, compacted=true}] shards and moved 58072 edges {code} To move past this, the workaround was to update the Graph_Source_Node_Edges column family in Cassandra to use SizeTieredCompactionStrategy and set the gc_grace to something low like 60 secs: {code} update column family Graph_Source_Node_Edges with compaction_strategy='SizeTieredCompactionStrategy' and compaction_strategy_options={min_sstable_size:50}; {code} After this update and the gc_grace period has passed, the following command was ran on the Cassandra nodes to manually force a compaction which removed the tombstones: {code} nodetool compact <yourKeyspaceName> Graph_Source_Node_Edges {code} Once the compaction is complete, revert back to LeveledCompactionStrategy with the default gc_grace of 864000 secs (10 days): {code} update column family Graph_Source_Node_Edges with compaction_strategy='LeveledCompactionStrategy' and bloom_filter_fp_chance=0.1 and compaction_strategy_options={sstable_size_in_mb:512}; {code} > Unable to run 2.0 to 2.1 data migration on large data set. > ---------------------------------------------------------- > > Key: USERGRID-785 > URL: https://issues.apache.org/jira/browse/USERGRID-785 > Project: Usergrid > Issue Type: Bug > Reporter: Michael Russo > Assignee: Michael Russo > > Attempted to run entity data migration in (UG 2.0 to UG 2.1 ) with a instance > having 1.1 million entities in a single collection, within a single > application. This causes the following exception on cassandra: > {code} > ERROR [ReadStage:16] 2015-07-02 23:33:29,720 SliceQueryFilter.java (line 206) > Scanned over 100000 tombstones in ug_migrate_test.Graph_Source_Node_Edges; > query aborted (see tombstone_failure_threshold) > ERROR [ReadStage:27] 2015-07-02 23:33:39,723 SliceQueryFilter.java (line 206) > Scanned over 100000 tombstones in ug_migrate_test.Graph_Source_Node_Edges; > query aborted (see tombstone_failure_threshold) > ERROR [ReadStage:27] 2015-07-02 23:33:39,723 CassandraDaemon.java (line 258) > Exception in thread Thread[ReadStage:27,5,main] > java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2016) > at > {code} > which results in the following exception in usergrid: > {code} > 2015-07-02 23:33:59,680 [Index migrate data formats] ERROR > org.apache.usergrid.rest.MigrateResource- Unable to migrate data > java.lang.RuntimeException: Unable to connect to casandra > at > org.apache.usergrid.persistence.core.astyanax.MultiRowColumnIterator.advance(MultiRowColumnIterator.java:190) > at > org.apache.usergrid.persistence.core.astyanax.MultiRowColumnIterator.hasNext(MultiRowColumnIterator.java:122) > at > org.apache.usergrid.persistence.graph.serialization.impl.shard.impl.ShardsColumnIterator.hasNext(ShardsColumnIterator.java:65) > at > org.apache.usergrid.persistence.graph.serialization.impl.shard.impl.ShardGroupColumnIterator.advance(ShardGroupColumnIterator.java:120) > at > org.apache.usergrid.persistence.graph.serialization.impl.shard.impl.ShardGroupColumnIterator.hasNext(ShardGroupColumnIterator.java:68) > at > org.apache.usergrid.persistence.core.rx.ObservableIterator.call(ObservableIterator.java:66) > at > org.apache.usergrid.persistence.core.rx.ObservableIterator.call(ObservableIterator.java:38) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable.unsafeSubscribe(Observable.java:7495) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleNewSource(OperatorMerge.java:215) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:185) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:120) > at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:55) > at > rx.internal.operators.OperatorDoOnEach$1.onNext(OperatorDoOnEach.java:84) > at > org.apache.usergrid.persistence.core.rx.ObservableIterator.call(ObservableIterator.java:71) > at > org.apache.usergrid.persistence.core.rx.ObservableIterator.call(ObservableIterator.java:38) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable.unsafeSubscribe(Observable.java:7495) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleNewSource(OperatorMerge.java:215) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:185) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:120) > at > rx.internal.operators.OnSubscribeFromIterable$IterableProducer.request(OnSubscribeFromIterable.java:96) > at rx.Subscriber.setProducer(Subscriber.java:177) > at > rx.internal.operators.OnSubscribeFromIterable.call(OnSubscribeFromIterable.java:47) > at > rx.internal.operators.OnSubscribeFromIterable.call(OnSubscribeFromIterable.java:33) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable.unsafeSubscribe(Observable.java:7495) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleNewSource(OperatorMerge.java:215) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:185) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:120) > at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:55) > at > rx.internal.operators.OperatorMerge$InnerSubscriber.emit(OperatorMerge.java:676) > at > rx.internal.operators.OperatorMerge$InnerSubscriber.onNext(OperatorMerge.java:586) > at > rx.internal.operators.OperatorMerge$InnerSubscriber.emit(OperatorMerge.java:676) > at > rx.internal.operators.OperatorMerge$InnerSubscriber.onNext(OperatorMerge.java:586) > at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:55) > at rx.internal.operators.OperatorFilter$1.onNext(OperatorFilter.java:54) > at > rx.internal.operators.OperatorDoOnEach$1.onNext(OperatorDoOnEach.java:84) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleScalarSynchronousObservableWithRequestLimits(OperatorMerge.java:280) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleScalarSynchronousObservable(OperatorMerge.java:243) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:176) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:120) > at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:55) > at > rx.internal.operators.OperatorDoOnEach$1.onNext(OperatorDoOnEach.java:84) > at > org.apache.usergrid.persistence.collection.impl.EntityCollectionManagerImpl$1.call(EntityCollectionManagerImpl.java:248) > at > org.apache.usergrid.persistence.collection.impl.EntityCollectionManagerImpl$1.call(EntityCollectionManagerImpl.java:240) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable.unsafeSubscribe(Observable.java:7495) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleNewSource(OperatorMerge.java:215) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:185) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:120) > at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:55) > at > rx.internal.operators.OperatorDoOnEach$1.onNext(OperatorDoOnEach.java:84) > at > rx.internal.operators.OperatorMerge$InnerSubscriber.emit(OperatorMerge.java:676) > at > rx.internal.operators.OperatorMerge$InnerSubscriber.onNext(OperatorMerge.java:586) > at rx.internal.operators.OperatorFilter$1.onNext(OperatorFilter.java:54) > at > rx.internal.operators.OnSubscribeFromIterable$IterableProducer.request(OnSubscribeFromIterable.java:96) > at rx.Subscriber.setProducer(Subscriber.java:177) > at rx.Subscriber.setProducer(Subscriber.java:171) > at > rx.internal.operators.OnSubscribeFromIterable.call(OnSubscribeFromIterable.java:47) > at > rx.internal.operators.OnSubscribeFromIterable.call(OnSubscribeFromIterable.java:33) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable.unsafeSubscribe(Observable.java:7495) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleNewSource(OperatorMerge.java:215) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:185) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:120) > at rx.internal.operators.OperatorMap$1.onNext(OperatorMap.java:55) > at > rx.internal.operators.OperatorBufferWithSize$1.onCompleted(OperatorBufferWithSize.java:119) > at > org.apache.usergrid.persistence.core.rx.ObservableIterator.call(ObservableIterator.java:75) > at > org.apache.usergrid.persistence.core.rx.ObservableIterator.call(ObservableIterator.java:38) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable.unsafeSubscribe(Observable.java:7495) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.handleNewSource(OperatorMerge.java:215) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:185) > at > rx.internal.operators.OperatorMerge$MergeSubscriber.onNext(OperatorMerge.java:120) > at > rx.internal.operators.OnSubscribeFromIterable$IterableProducer.request(OnSubscribeFromIterable.java:96) > at rx.Subscriber.setProducer(Subscriber.java:177) > at > rx.internal.operators.OnSubscribeFromIterable.call(OnSubscribeFromIterable.java:47) > at > rx.internal.operators.OnSubscribeFromIterable.call(OnSubscribeFromIterable.java:33) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable$1.call(Observable.java:144) > at rx.Observable$1.call(Observable.java:136) > at rx.Observable.unsafeSubscribe(Observable.java:7495) > at > rx.internal.operators.OperatorSubscribeOn$1$1.call(OperatorSubscribeOn.java:62) > at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: > com.netflix.astyanax.connectionpool.exceptions.OperationTimeoutException: > OperationTimeoutException: [host=10.16.4.135(10.16.4.135):9160, > latency=5001(30012), attempts=6]TimedOutException() > at > com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:171) > at > com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65) > at > com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) > at > com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151) > at > com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119) > at > com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338) > at > com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:532) > at > org.apache.usergrid.persistence.core.astyanax.MultiRowColumnIterator.advance(MultiRowColumnIterator.java:187) > ... 146 more > Caused by: TimedOutException() > at > org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:10480) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:673) > at > org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:657) > at > com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:538) > at > com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:535) > at > com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60) > ... 152 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)