[ 
https://issues.apache.org/jira/browse/KNOX-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301294#comment-17301294
 ] 

ASF subversion and git services commented on KNOX-2549:
-------------------------------------------------------

Commit d0d5417d22f359d5910d934c37175fa00213b694 in knox's branch 
refs/heads/master from Sandor Molnar
[ https://gitbox.apache.org/repos/asf?p=knox.git;h=d0d5417 ]

KNOX-2549 - Fetching audit events from CM beyond previous query timestamp (#412)

To mitigate the chance of losing relevant audit events from CM, Knox needs to 
look back a little bit more than the last query timestamp. To be able to do 
this without processing relevant events again and again a cache was introduced 
to have them in memory for a short time.
Additionally, I removed made the maximum audit event resulset size so that 
every audit events are fetched from CM within the given time interval.

> Knox CM discovery may lose relevant audit events
> ------------------------------------------------
>
>                 Key: KNOX-2549
>                 URL: https://issues.apache.org/jira/browse/KNOX-2549
>             Project: Apache Knox
>          Issue Type: Bug
>          Components: cm-discovery
>    Affects Versions: 1.5.0
>            Reporter: Sandor Molnar
>            Assignee: Sandor Molnar
>            Priority: Major
>             Fix For: 1.6.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
>  *Test steps:*
> 1. Configure Knox service discovery; Knox is in HA
> 1. Updated hive on tez-cm config {{hive.server2.transport.mode}} from binary 
> to http
> 2. Restarted {{Hive on Tez}} service
> *Expected:*
> After 2 mins, check if the HIVE service have the updated endpoint URL in the 
> relevant topology 
> *{color:#de350b}Issue:{color}*
> HIVE endpoint URL is updated on only one of the Knox instances.
>  
> Let's see the relevant gateway logs and the related commands from CM of my 
> new findings:
> {noformat}
> gateway.log - knox-1-host
> -----------------------------
> 2020-12-03 00:32:23,751 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:run(182)) - Checking Cluster 1 @ 
> https://my-cm-host:7183 for configuration changes...
> 2020-12-03 00:32:23,769 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:getRelevantEvents(393)) - Querying 
> configuration activation events from Cluster 1 @ https://my-cm-host:7183 
> since 2020-12-03T08:31:23.681Z
> 2020-12-03 00:32:23,861 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:isRelevantEvent(422)) - Activation event 
> relevance: e8cd499f-49ad-4412-9d3a-7a4cbad3e5c2 = false
> 2020-12-03 00:32:23,861 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:isRelevantEvent(422)) - Activation event 
> relevance: d3176932-8c50-4c42-879c-7e25ae9e2cbd = false
> 2020-12-03 00:32:32,042 DEBUG knox.gateway 
> (ClouderaManagerDescriptorMonitor.java:monitorClouderaManagerDescriptor(96)) 
> - Skipping CM descriptor monitoring of 
> /var/lib/knox/gateway/conf/descriptors/cdp-resources.cm (force = false; 
> lastReloadTime = 2020-12-03T08:20:24.486931Z; lastModified = 
> 2020-12-03T08:20:24.486931Z)
> ...
> 2020-12-03 00:33:23,862 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:run(182)) - Checking Cluster 1 @ 
> https://my-cm-host:7183 for configuration changes...
> 2020-12-03 00:33:23,874 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:getRelevantEvents(393)) - Querying 
> configuration activation events from Cluster 1 @ https://my-cm-host:7183 
> since 2020-12-03T08:32:23.769Z
> 2020-12-03 00:33:23,938 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:getRelevantEvents(402)) - There is no any 
> activation event found within the given time period
> gateway.log - knox-host-2
> -----------------------------
> 2020-12-03 00:32:07,745 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:run(182)) - Checking Cluster 1 @ 
> https://my-cm-host:7183 for configuration changes...
> 2020-12-03 00:32:07,920 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:getRelevantEvents(393)) - Querying 
> configuration activation events from Cluster 1 @ https://my-cm-host:7183 
> since 2020-12-03T08:31:07.670Z
> 2020-12-03 00:32:08,039 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:isRelevantEvent(422)) - Activation event 
> relevance: e8cd499f-49ad-4412-9d3a-7a4cbad3e5c2 = false
> 2020-12-03 00:32:08,043 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:isRelevantEvent(422)) - Activation event 
> relevance: d3176932-8c50-4c42-879c-7e25ae9e2cbd = false
> ...
> 2020-12-03 00:33:08,045 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:run(182)) - Checking Cluster 1 @ 
> https://my-cm-host:7183 for configuration changes...
> 2020-12-03 00:33:08,083 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:getRelevantEvents(393)) - Querying 
> configuration activation events from Cluster 1 @ https://my-cm-host:7183 
> since 2020-12-03T08:32:07.922Z
> 2020-12-03 00:33:08,184 DEBUG discovery.cm 
> (PollingConfigurationAnalyzer.java:isRelevantEvent(422)) - Activation event 
> relevance: 7537adcb-d7da-4591-aa66-1f857584c099 = true
> AUDIT events fetched from CM
> ----------------------------
> https://my-cm-host:7183/api/v41/events?query=category==AUDIT_EVENT;timeOccurred=gt=2020-12-03T08:31:23.681Z;timeOccurred=lt=2020-12-03T08:32:23.769Z
> {
>       totalResults: 3,
>       items: [{
>                       id: "7537adcb-d7da-4591-aa66-1f857584c099",
>                       content: "Command Start on subject ATLAS-1 succeeded.",
>                       timeOccurred: "2020-12-03T08:32:20.366Z",
>                       timeReceived: "2020-12-03T08:32:20.369Z",
>                       category: "AUDIT_EVENT",
>                       severity: "INFORMATIONAL",
>                       alert: false,
>                       attributes: [{
>                                       values: [
>                                               "SvcCmdArgs{targetRoles=[], 
> args=[]}"
>                                       ],
>                                       name: "COMMAND_ARGS"
>                               },
>                               {
>                                       values: [
>                                               "Successfully started service."
>                                       ],
>                                       name: "COMMAND_RESULT"
>                               },
>                               {
>                                       values: [
>                                               "556"
>                                       ],
>                                       name: "COMMAND_ID"
>                               },
>                               {
>                                       values: [
>                                               "clouderaManager"
>                                       ],
>                                       name: "USER"
>                               },
>                               {
>                                       values: [
>                                               "Cluster 1"
>                                       ],
>                                       name: "CLUSTER_DISPLAY_NAME"
>                               },
>                               {
>                                       values: [
>                                               
> "https://my-cm-host5003/cmf/eventRedirect/7537adcb-d7da-4591-aa66-1f857584c099";
>                                       ],
>                                       name: "URL"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS-1"
>                                       ],
>                                       name: "SERVICE_DISPLAY_NAME"
>                               },
>                               {
>                                       values: [
>                                               "SUCCEEDED"
>                                       ],
>                                       name: "COMMAND_STATUS"
>                               },
>                               {
>                                       values: [
>                                               "EV_SERVICE_STARTED"
>                                       ],
>                                       name: "EVENTCODE"
>                               },
>                               {
>                                       values: [
>                                               "false"
>                                       ],
>                                       name: "ALERT_SUPPRESSED"
>                               },
>                               {
>                                       values: [
>                                               "Start"
>                                       ],
>                                       name: "COMMAND"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS-1"
>                                       ],
>                                       name: "SERVICE"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS"
>                                       ],
>                                       name: "SERVICE_TYPE"
>                               },
>                               {
>                                       values: [
>                                               "1"
>                                       ],
>                                       name: "CLUSTER_ID"
>                               },
>                               {
>                                       values: [
>                                               "Cluster 1"
>                                       ],
>                                       name: "CLUSTER"
>                               },
>                               {
>                                       values: [
>                                               "COMMAND_SUCCEEDED_WITH_TARGET"
>                                       ],
>                                       name: "MESSAGE_CODES"
>                               }
>                       ]
>               },
>               {
>                       id: "e8cd499f-49ad-4412-9d3a-7a4cbad3e5c2",
>                       content: "User admin created a new revision. Message 
> was: Modified Server HTTPS Port.",
>                       timeOccurred: "2020-12-03T08:32:01.769Z",
>                       timeReceived: "2020-12-03T08:32:01.772Z",
>                       category: "AUDIT_EVENT",
>                       severity: "INFORMATIONAL",
>                       alert: false,
>                       attributes: [{
>                                       values: [
>                                               "257"
>                                       ],
>                                       name: "REVISION"
>                               },
>                               {
>                                       values: [
>                                               "admin"
>                                       ],
>                                       name: "USER"
>                               },
>                               {
>                                       values: [
>                                               "Cluster 1"
>                                       ],
>                                       name: "CLUSTER_DISPLAY_NAME"
>                               },
>                               {
>                                       values: [
>                                               
> "https://knox-host-2:5003/cmf/eventRedirect/e8cd499f-49ad-4412-9d3a-7a4cbad3e5c2";
>                                       ],
>                                       name: "URL"
>                               },
>                               {
>                                       values: [
>                                               
> "ATLAS-1-ATLAS_SERVER-353d9af9c42d35ad516e1b1ff11f3a63"
>                                       ],
>                                       name: "ROLE"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS-1-ATLAS_SERVER-BASE"
>                                       ],
>                                       name: "ROLE_CONFIG_GROUP"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS-1"
>                                       ],
>                                       name: "SERVICE_DISPLAY_NAME"
>                               },
>                               {
>                                       values: [
>                                               "EV_REVISION_CREATED"
>                                       ],
>                                       name: "EVENTCODE"
>                               },
>                               {
>                                       values: [
>                                               "false"
>                                       ],
>                                       name: "ALERT_SUPPRESSED"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS-1"
>                                       ],
>                                       name: "SERVICE"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS"
>                                       ],
>                                       name: "SERVICE_TYPE"
>                               },
>                               {
>                                       values: [
>                                               "1"
>                                       ],
>                                       name: "CLUSTER_ID"
>                               },
>                               {
>                                       values: [
>                                               "Cluster 1"
>                                       ],
>                                       name: "CLUSTER"
>                               },
>                               {
>                                       values: [
>                                               "REVISION_CREATED"
>                                       ],
>                                       name: "MESSAGE_CODES"
>                               }
>                       ]
>               },
>               {
>                       id: "d3176932-8c50-4c42-879c-7e25ae9e2cbd",
>                       content: "User admin started service Start.",
>                       timeOccurred: "2020-12-03T08:31:58.360Z",
>                       timeReceived: "2020-12-03T08:31:58.361Z",
>                       category: "AUDIT_EVENT",
>                       severity: "INFORMATIONAL",
>                       alert: false,
>                       attributes: [{
>                                       values: [
>                                               
> "SvcCmdArgs{targetRoles=[DbRole{id=65, 
> name=ATLAS-1-ATLAS_SERVER-353d9af9c42d35ad516e1b1ff11f3a63, 
> hostName=knox-host-2}], args=[]}"
>                                       ],
>                                       name: "COMMAND_ARGS"
>                               },
>                               {
>                                       values: [
>                                               "556"
>                                       ],
>                                       name: "COMMAND_ID"
>                               },
>                               {
>                                       values: [
>                                               "admin"
>                                       ],
>                                       name: "USER"
>                               },
>                               {
>                                       values: [
>                                               "Cluster 1"
>                                       ],
>                                       name: "CLUSTER_DISPLAY_NAME"
>                               },
>                               {
>                                       values: [
>                                               
> "https://knox-host-2:5003/cmf/eventRedirect/d3176932-8c50-4c42-879c-7e25ae9e2cbd";
>                                       ],
>                                       name: "URL"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS-1"
>                                       ],
>                                       name: "SERVICE_DISPLAY_NAME"
>                               },
>                               {
>                                       values: [
>                                               "STARTED"
>                                       ],
>                                       name: "COMMAND_STATUS"
>                               },
>                               {
>                                       values: [
>                                               "EV_SERVICE_STARTED"
>                                       ],
>                                       name: "EVENTCODE"
>                               },
>                               {
>                                       values: [
>                                               "false"
>                                       ],
>                                       name: "ALERT_SUPPRESSED"
>                               },
>                               {
>                                       values: [
>                                               "Start"
>                                       ],
>                                       name: "COMMAND"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS-1"
>                                       ],
>                                       name: "SERVICE"
>                               },
>                               {
>                                       values: [
>                                               "ATLAS"
>                                       ],
>                                       name: "SERVICE_TYPE"
>                               },
>                               {
>                                       values: [
>                                               "1"
>                                       ],
>                                       name: "CLUSTER_ID"
>                               },
>                               {
>                                       values: [
>                                               "Cluster 1"
>                                       ],
>                                       name: "CLUSTER"
>                               },
>                               {
>                                       values: [
>                                               "SERVICE_STARTED"
>                                       ],
>                                       name: "MESSAGE_CODES"
>                               }
>                       ]
>               }
>       ]
> }
> {noformat}
> On {{knox-host-2}} everything worked as expected. However, on 
> {{knox-host-1}}, my ATLAS service configuration change got lost:
>  - The start command successfully finished on {{2020-12-03T08:32:20.366Z}}
>  - Polling configuration analyzer fetched audit events from CM server at 
> {{2020-12-03 08:32:23,769Z}} since {{2020-12-03T08:31:23.681Z}}. However, CM 
> did +_not_+ return the start command that finished 3 seconds ago, even it 
> should have returned it
>  - A minute later, the polling configuration analyzer fetched audit events 
> from CM server at {{2020-12-03 08:33:23,874Z}} since 
> {{2020-12-03T08:32:23.769Z}} (the last query timestamp). This query returned 
> nothing which is correct since there was no activation within this time frame
> As a result, the command with ID of {{7537adcb-d7da-4591-aa66-1f857584c099}} 
> has been lost on {{knox-host-1}}. It's important to note that the first 
> timestamp, when Knox issued a CM query, was sort of close to the timestamp 
> when the successful start event occurred. There seems to be a lag between the 
> CM server/agent communication which may cause this issue. This means Knox 
> should be smarter in this respect and try to mitigate the chance of missing 
> relevant audit events.
> I'm going to implement the following approach:
>  - Knox will look back a little bit more time than is does currently: instead 
> of checking the {{[lastQueryTimestamp -> currentTimestamp]}} interval, we'll 
> check {{[lastQueryTimestamp - (2 x pollingInterval) -> currentTimestamp]}}
>  - Knox will maintain a cache of all relevant events that it processed to 
> avoid re-processing them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to