[ 
https://issues.apache.org/jira/browse/CASSANDRA-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331346#comment-16331346
 ] 

Pedro Gordo edited comment on CASSANDRA-12750 at 1/18/18 10:24 PM:
-------------------------------------------------------------------

If you run the same query from within cqlsh (with tracing on), do you get a 
message of the sort:
{{Trace session id: blablabla [more]}}
{{Unable to retrieve Query Trace information [more]}} ?


was (Author: pedro_gordo):
If you run the same query from within cqlsh (with tracing on), do you get a 
message of the sort:
{{Trace session id: blablabla [more]
Unable to retrieve Query Trace information [more]}} ?

> TRACE messages get dropped
> --------------------------
>
>                 Key: CASSANDRA-12750
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12750
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths, Observability
>         Environment: Ubuntu 14.04 LTS
>            Reporter: Kuku1
>            Priority: Major
>
> Two of my nodes drop TRACE messages for some reason.
> I'm using Cassandra 3.0.8.
> I'm running a "select * from ... " query that will return 300.000 rows. I 
> want to measure the total query runtime. 
> My application runs the following code. 
> {code}
>     PreparedStatement preparedStatement = 
> connection.prepareStatement(query).enableTracing();
>     if (preparedStatement == null) {
>         return;
>     }
>     BoundStatement bs = preparedStatement.bind();
>     ResultSet rs = connection.execute(bs);
>     while (!rs.isFullyFetched()) {
>         rs.fetchMoreResults();
>     }
>     //all results have been fetched at this point
>     //sleep 15s to wait for C* to write the query traces
>     Thread.sleep(15000);
>     long totalQueryRuntime = 0;
>     List<ExecutionInfo> executionInfos = rs.getAllExecutionInfo();
>     for (ExecutionInfo ex : executionInfos) {
>         QueryTrace queryTrace = ex.getQueryTrace();
>         totalQueryRuntime += queryTrace.getDurationMicros();
>         if (verbose) {
>             for (Event event : queryTrace.getEvents()) {
>                 //some logging
>             }
>         }
>     }
> {code}
> The application fails with this exception:
> {code}
> com.datastax.driver.core.exceptions.TraceRetrievalException: Unable to 
> retrieve complete query trace for id 88dae7d0-8ada-11e6-9f80-d9082a06988f 
> after 5 tries
>         at 
> com.datastax.driver.core.QueryTrace.doFetchTrace(QueryTrace.java:228)
>         at 
> com.datastax.driver.core.QueryTrace.maybeFetchTrace(QueryTrace.java:177)
>         at 
> com.datastax.driver.core.QueryTrace.getDurationMicros(QueryTrace.java:103)
> {code}
> In my system.logs, I can find these output (similar) on two of my six nodes:
> {code}
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,650 MessagingService.java:983 - 
> _TRACE messages were dropped in last 5000 ms: 511 for internal timeout and 0 
> for cross node timeout
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,651 StatusLogger.java:52 - Pool 
> Name                    Active   Pending      Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,657 StatusLogger.java:56 - 
> MutationStage                     0         0          35036         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,658 StatusLogger.java:56 - 
> ViewMutationStage                 0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,658 StatusLogger.java:56 - 
> ReadStage                         0         0            399         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,659 StatusLogger.java:56 - 
> RequestResponseStage              0         0           7158         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,659 StatusLogger.java:56 - 
> ReadRepairStage                   0         0              2         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,660 StatusLogger.java:56 - 
> CounterMutationStage              0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,661 StatusLogger.java:56 - 
> MiscStage                         0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,661 StatusLogger.java:56 - 
> CompactionExecutor                0         0             52         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,662 StatusLogger.java:56 - 
> MemtableReclaimMemory             0         0             21         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,662 StatusLogger.java:56 - 
> PendingRangeCalculator            0         0              8         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,663 StatusLogger.java:56 - 
> GossipStage                       0         0            476         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,663 StatusLogger.java:56 - 
> SecondaryIndexManagement          0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,664 StatusLogger.java:56 - 
> HintsDispatcher                   0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,664 StatusLogger.java:56 - 
> MigrationStage                    0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,665 StatusLogger.java:56 - 
> MemtablePostFlush                 0         0             40         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,666 StatusLogger.java:56 - 
> ValidationExecutor                0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,666 StatusLogger.java:56 - 
> Sampler                           0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,667 StatusLogger.java:56 - 
> MemtableFlushWriter               0         0             21         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,668 StatusLogger.java:56 - 
> InternalResponseStage             0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,669 StatusLogger.java:56 - 
> AntiEntropyStage                  0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,670 StatusLogger.java:56 - 
> CacheCleanupExecutor              0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:56 - 
> Native-Transport-Requests         0         0            116         0        
>          0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:66 - 
> CompactionManager                 0         1
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:78 - 
> MessagingService                n/a       0/0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:88 - Cache 
> Type                     Size                 Capacity               
> KeysToSave
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:90 - 
> KeyCache                       3160                104857600                  
>     all
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:96 - 
> RowCache                          0                        0                  
>     all
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:103 - 
> Table                       Memtable ops,data
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - 
> system_distributed.parent_repair_history                 0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - 
> system_distributed.repair_history                 0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - 
> system.compaction_history               2,384
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - 
> system.hints                              0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - 
> system.schema_aggregates                  0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.IndexInfo                          0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.schema_columnfamilies                 0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.schema_triggers                    0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.size_estimates             2120,252650
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.schema_functions                   0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.paxos                              0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.views_builds_in_progress                 0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - 
> system.built_views                        0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - 
> system.peer_events                        0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - 
> system.range_xfers                        0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - 
> system.peers                         97,46227
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - 
> system.batches                            0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - 
> system.schema_keyspaces                   0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - 
> system.schema_usertypes                   0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - 
> system.local                              0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> system.sstable_activity               28,1100
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> system.available_ranges                   0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> system.batchlog                           0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> system.schema_columns                     0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> fleetstream.data                          0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> system_schema.columns               168,25858
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> system_schema.types                      2,16
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - 
> system_schema.indexes                    2,16
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - 
> system_schema.keyspaces                 4,242
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - 
> system_schema.dropped_columns                2,16
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - 
> system_schema.aggregates                 2,16
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - 
> system_schema.triggers                   2,16
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - 
> system_schema.tables                 27,17400
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - 
> system_schema.views                      2,16
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - 
> system_schema.functions                  2,16
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - 
> test.data                                 0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - 
> system_auth.roles                         0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - 
> system_auth.role_members                  0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - 
> system_auth.resource_role_permissons_index                 0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - 
> system_auth.role_permissions                 0,0
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - 
> system_traces.sessions                58,9106
> INFO  [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - 
> system_traces.events                320,62299
> {code}
> nodetool tpstats on one of the two nodes that drop the traces shows:
> {code}
> nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> MutationStage                     0         0          32911         0        
>          0
> ViewMutationStage                 0         0              0         0        
>          0
> ReadStage                         0         0           1065         0        
>          0
> RequestResponseStage              0         0          10969         0        
>          0
> ReadRepairStage                   0         0              6         0        
>          0
> CounterMutationStage              0         0              0         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> CompactionExecutor                0         0            972         0        
>          0
> MemtableReclaimMemory             0         0             21         0        
>          0
> PendingRangeCalculator            0         0              7         0        
>          0
> GossipStage                       0         0           6278         0        
>          0
> SecondaryIndexManagement          0         0              0         0        
>          0
> HintsDispatcher                   0         0              0         0        
>          0
> MigrationStage                    0         0              0         0        
>          0
> MemtablePostFlush                 0         0             40         0        
>          0
> ValidationExecutor                0         0              0         0        
>          0
> Sampler                           0         0              0         0        
>          0
> MemtableFlushWriter               0         0             21         0        
>          0
> InternalResponseStage             0         0              0         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> Native-Transport-Requests         0         0            209         0        
>          0
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                    2789
> HINT                         0
> MUTATION                     0
> COUNTER_MUTATION             0
> BATCH_STORE                  0
> BATCH_REMOVE                 0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> {code}
> It should not be a memory problem because my nodes have ~20GB of free RAM:
> {code}
>              total       used       free     shared    buffers     cached
> Mem:      49450072   27703928   21746144        944     181696   17040856
> -/+ buffers/cache:   10481376   38968696
> Swap:     50319356        148   50319208
>              total       used       free     shared    buffers     cached
> Mem:      49450072   30405324   19044748       1124     225256   17627620
> -/+ buffers/cache:   12552448   36897624
> Swap:     50319356          0   50319356
> {code}
> I tried with default fetchSize (5000) as well as fetchSize 50000 - both did 
> not help. Restarting all C* services did not help either. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to