[ https://issues.apache.org/jira/browse/CASSANDRA-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331346#comment-16331346 ]
Pedro Gordo edited comment on CASSANDRA-12750 at 1/18/18 10:24 PM: ------------------------------------------------------------------- If you run the same query from within cqlsh (with tracing on), do you get a message of the sort: {{Trace session id: blablabla [more] Unable to retrieve Query Trace information [more]}} ? was (Author: pedro_gordo): If you run the same query from within cqlsh (with tracing on), do you get a message of the sort: {{Trace session id: blablabla [more] Unable to retrieve Query Trace information [more]}} ? > TRACE messages get dropped > -------------------------- > > Key: CASSANDRA-12750 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12750 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Observability > Environment: Ubuntu 14.04 LTS > Reporter: Kuku1 > Priority: Major > > Two of my nodes drop TRACE messages for some reason. > I'm using Cassandra 3.0.8. > I'm running a "select * from ... " query that will return 300.000 rows. I > want to measure the total query runtime. > My application runs the following code. > {code} > PreparedStatement preparedStatement = > connection.prepareStatement(query).enableTracing(); > if (preparedStatement == null) { > return; > } > BoundStatement bs = preparedStatement.bind(); > ResultSet rs = connection.execute(bs); > while (!rs.isFullyFetched()) { > rs.fetchMoreResults(); > } > //all results have been fetched at this point > //sleep 15s to wait for C* to write the query traces > Thread.sleep(15000); > long totalQueryRuntime = 0; > List<ExecutionInfo> executionInfos = rs.getAllExecutionInfo(); > for (ExecutionInfo ex : executionInfos) { > QueryTrace queryTrace = ex.getQueryTrace(); > totalQueryRuntime += queryTrace.getDurationMicros(); > if (verbose) { > for (Event event : queryTrace.getEvents()) { > //some logging > } > } > } > {code} > The application fails with this exception: > {code} > com.datastax.driver.core.exceptions.TraceRetrievalException: Unable to > retrieve complete query trace for id 88dae7d0-8ada-11e6-9f80-d9082a06988f > after 5 tries > at > com.datastax.driver.core.QueryTrace.doFetchTrace(QueryTrace.java:228) > at > com.datastax.driver.core.QueryTrace.maybeFetchTrace(QueryTrace.java:177) > at > com.datastax.driver.core.QueryTrace.getDurationMicros(QueryTrace.java:103) > {code} > In my system.logs, I can find these output (similar) on two of my six nodes: > {code} > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,650 MessagingService.java:983 - > _TRACE messages were dropped in last 5000 ms: 511 for internal timeout and 0 > for cross node timeout > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,651 StatusLogger.java:52 - Pool > Name Active Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,657 StatusLogger.java:56 - > MutationStage 0 0 35036 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,658 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,658 StatusLogger.java:56 - > ReadStage 0 0 399 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,659 StatusLogger.java:56 - > RequestResponseStage 0 0 7158 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,659 StatusLogger.java:56 - > ReadRepairStage 0 0 2 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,660 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,661 StatusLogger.java:56 - > MiscStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,661 StatusLogger.java:56 - > CompactionExecutor 0 0 52 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,662 StatusLogger.java:56 - > MemtableReclaimMemory 0 0 21 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,662 StatusLogger.java:56 - > PendingRangeCalculator 0 0 8 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,663 StatusLogger.java:56 - > GossipStage 0 0 476 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,663 StatusLogger.java:56 - > SecondaryIndexManagement 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,664 StatusLogger.java:56 - > HintsDispatcher 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,664 StatusLogger.java:56 - > MigrationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,665 StatusLogger.java:56 - > MemtablePostFlush 0 0 40 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,666 StatusLogger.java:56 - > ValidationExecutor 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,666 StatusLogger.java:56 - > Sampler 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,667 StatusLogger.java:56 - > MemtableFlushWriter 0 0 21 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,668 StatusLogger.java:56 - > InternalResponseStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,669 StatusLogger.java:56 - > AntiEntropyStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,670 StatusLogger.java:56 - > CacheCleanupExecutor 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:56 - > Native-Transport-Requests 0 0 116 0 > 0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:66 - > CompactionManager 0 1 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:78 - > MessagingService n/a 0/0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:88 - Cache > Type Size Capacity > KeysToSave > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:90 - > KeyCache 3160 104857600 > all > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:96 - > RowCache 0 0 > all > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:103 - > Table Memtable ops,data > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - > system_distributed.parent_repair_history 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - > system_distributed.repair_history 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - > system.compaction_history 2,384 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - > system.hints 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 - > system.schema_aggregates 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.IndexInfo 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.schema_columnfamilies 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.schema_triggers 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.size_estimates 2120,252650 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.schema_functions 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.paxos 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.views_builds_in_progress 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 - > system.built_views 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - > system.peer_events 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - > system.range_xfers 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - > system.peers 97,46227 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - > system.batches 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - > system.schema_keyspaces 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - > system.schema_usertypes 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 - > system.local 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > system.sstable_activity 28,1100 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > system.available_ranges 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > system.batchlog 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > system.schema_columns 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > fleetstream.data 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > system_schema.columns 168,25858 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > system_schema.types 2,16 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 - > system_schema.indexes 2,16 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - > system_schema.keyspaces 4,242 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - > system_schema.dropped_columns 2,16 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - > system_schema.aggregates 2,16 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - > system_schema.triggers 2,16 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - > system_schema.tables 27,17400 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - > system_schema.views 2,16 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 - > system_schema.functions 2,16 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - > test.data 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - > system_auth.roles 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - > system_auth.role_members 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - > system_auth.resource_role_permissons_index 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - > system_auth.role_permissions 0,0 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - > system_traces.sessions 58,9106 > INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 - > system_traces.events 320,62299 > {code} > nodetool tpstats on one of the two nodes that drop the traces shows: > {code} > nodetool tpstats > Pool Name Active Pending Completed Blocked All > time blocked > MutationStage 0 0 32911 0 > 0 > ViewMutationStage 0 0 0 0 > 0 > ReadStage 0 0 1065 0 > 0 > RequestResponseStage 0 0 10969 0 > 0 > ReadRepairStage 0 0 6 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MiscStage 0 0 0 0 > 0 > CompactionExecutor 0 0 972 0 > 0 > MemtableReclaimMemory 0 0 21 0 > 0 > PendingRangeCalculator 0 0 7 0 > 0 > GossipStage 0 0 6278 0 > 0 > SecondaryIndexManagement 0 0 0 0 > 0 > HintsDispatcher 0 0 0 0 > 0 > MigrationStage 0 0 0 0 > 0 > MemtablePostFlush 0 0 40 0 > 0 > ValidationExecutor 0 0 0 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableFlushWriter 0 0 21 0 > 0 > InternalResponseStage 0 0 0 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > Native-Transport-Requests 0 0 209 0 > 0 > Message type Dropped > READ 0 > RANGE_SLICE 0 > _TRACE 2789 > HINT 0 > MUTATION 0 > COUNTER_MUTATION 0 > BATCH_STORE 0 > BATCH_REMOVE 0 > REQUEST_RESPONSE 0 > PAGED_RANGE 0 > READ_REPAIR 0 > {code} > It should not be a memory problem because my nodes have ~20GB of free RAM: > {code} > total used free shared buffers cached > Mem: 49450072 27703928 21746144 944 181696 17040856 > -/+ buffers/cache: 10481376 38968696 > Swap: 50319356 148 50319208 > total used free shared buffers cached > Mem: 49450072 30405324 19044748 1124 225256 17627620 > -/+ buffers/cache: 12552448 36897624 > Swap: 50319356 0 50319356 > {code} > I tried with default fetchSize (5000) as well as fetchSize 50000 - both did > not help. Restarting all C* services did not help either. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org