[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581391#comment-16581391 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 Closing as merged > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581392#comment-16581392 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc closed the pull request at: https://github.com/apache/metron/pull/1157 > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581344#comment-16581344 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 Awesome! Thanks for the review and smoke test @nickwallen and @merrimanr. I am going to go ahead and merge this. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581321#comment-16581321 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1157 +1 I ran this up successfully. Validated... - [x] Alerts visible in the UI - [x] Metron Service Check - [x] Capture PCAP in HDFS - [x] Read PCAP from HDFS using CLI. - [x] Able to open resulting pcap file with `tshark -r `. - [x] Read PCAP from HDFS using PCAP UI - [x] Download PCAP from UI and open in Wireshark GUI. ![screen shot 2018-08-15 at 12 29 47 pm](https://user-images.githubusercontent.com/2475409/44160249-d5a21b00-a087-11e8-94b7-b5fd1daec8d9.png) > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581314#comment-16581314 ] ASF GitHub Bot commented on METRON-1732: Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1157 Please disregard. I failed to deploy the Ambari changes correctly. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581308#comment-16581308 ] ASF GitHub Bot commented on METRON-1732: Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1157 I ran a quick test in REST and it looks like the status never gets to `SUCCEEDED`. Here is my request: ``` curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{}' 'http://node1:8082/api/v1/pcap/fixed' ``` After the job finishes (looking at the RM UI), the status is: ``` { "jobId": "job_1533831319048_0046", "jobStatus": "FINALIZING", "description": "Finalizing job.", "percentComplete": 75, "pageTotal": 0 } ``` If I keep requesting status it always returns this. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580247#comment-16580247 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 Note - I also added a small blurb about pcap page size to the README.md alongside the notes on setting the finalizer threads. This was missed previously. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580241#comment-16580241 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r210059160 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java --- @@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) { } return this; } -mrJob.submit(); -jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); +synchronized (this) { --- End diff -- fyi, turns out I was right first time around. Synchronization is necessary for visibility in the timer thread that is started after these modifications. I've updated the comments in code to describe this. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580238#comment-16580238 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 **Testing** Test plan pulled from here - https://github.com/apache/metron/pull/1081#issuecomment-400556832 Get PCAP data into Metron: 1. Install and setup pycapa (this has been updated in master recently) - https://github.com/apache/metron/blob/master/metron-sensors/pycapa/README.md#centos-6 2. (if using singlenode vagrant) Kill the enrichment, profiler, indexing, and sensor topologies via `for i in bro enrichment random_access_indexing batch_indexing yaf snort;do storm kill $i;done` 3. Start the pcap topology via $METRON_HOME/bin/start_pcap_topology.sh 4. Start the pycapa packet capture producer on eth1 via /usr/bin/pycapa --producer --topic pcap -i eth1 -k node1:6667 5. Watch the topology in the Storm UI and kill the packet capture utility from before, when the number of packets ingested is over 3k. 6. Ensure that at at least 3 files exist on HDFS by running hadoop fs -ls /apps/metron/pcap 7. Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector utility via $METRON_HOME//bin/pcap_inspector.sh -i $FILE -n 5 8. Choose one of the lines and note the protocol. 9. Note that when you run the commands below, the resulting file will be placed in the execution directory where you kicked off the job from. ### Fixed filter 1. Run a fixed filter query by executing the following command with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch) 2. `$METRON_HOME/bin/pcap_query.sh fixed -st -df "MMdd" -p -rpf 500` 3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+.pcap 4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500. ### Query filter 1. Run a Stellar query filter query by executing a command similar to the following, with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch) 2. `$METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "MMdd" -query "protocol == '6'" -rpf 500` 3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+.pcap 4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500. Also run riffs on the fixed query via the Metron Alerts UI PCAP query panel. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578642#comment-16578642 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209689930 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java --- @@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) { } return this; } -mrJob.submit(); -jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); +synchronized (this) { --- End diff -- Will do. This lock is about thread visibility as opposed to actual issues with concurrent modification. It may be that this lock is not need with getStatus being synchronized. I will double check and report back via modified code and/or code comment on this. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578627#comment-16578627 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 Good feedback @nickwallen, I'll make adjustments. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578626#comment-16578626 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209687780 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { --- End diff -- As I understand it, submit is effectively submitting the set of tasks for the parallel stream to execute within this threadpool, e.g. https://www.baeldung.com/java-8-parallel-streams-custom-threadpool. As a side note, the reason for a custom threadpool at all is so that this doesn't cause issues with other streams since the default in Java is to use a global context for this sort of thing. Liveness issues may arise when using the shared global context. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578569#comment-16578569 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209649851 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Can we document the default value for this? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578570#comment-16578570 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209651293 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Should we mention that 1C, 4C are valid values in addition to integers? Perhaps just copy the text you have in the Ambari description into the README. Good stuff. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578574#comment-16578574 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209665613 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java --- @@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) { } return this; } -mrJob.submit(); -jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); +synchronized (this) { --- End diff -- Can we add a comment about why we need the lock here? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578571#comment-16578571 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209655011 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); --- End diff -- Should we add a catch block for when a user enters an invalid value? We should catch and provide a helpful exception message like "Invalid value for property 'pcap.finalizer.threadpool.size'; value='3CCC'". > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578572#comment-16578572 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209674410 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { --- End diff -- Shouldn't we be calling `tp.submit` for each (path, data)? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578573#comment-16578573 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209671313 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { + try { +Path path = e.getKey(); +List data = e.getValue(); +if (data.size() > 0) { + write(getResultsWriter(), hadoopConfig, data, path); + outFiles.add(path); +} + } catch (IOException ioe) { +throw new RuntimeException("Failed to write results", ioe); --- End diff -- Can we add the path that failed to write to the exception message? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578568#comment-16578568 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209650724 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Do you have any advice on when a user should increase/decrease this value? Are there errors I might see that would be resolved by increasing/decreasing this value? If you don't have a good understanding of this, then we don't need to worry about it. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578426#comment-16578426 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209649720 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Can we document the default value for this? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577320#comment-16577320 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 This PR also updates the status reporting to include 25% of the progress to include the finalizer. Testing locally found that a query via the Alerts PCAP UI with page size set small (10 results per page), resulting in 7,299 pages took 15 minutes with parallelism set to 1. With parallelism set to 8 it went down to 2-3 minutes. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575759#comment-16575759 ] ASF GitHub Bot commented on METRON-1732: GitHub user mmiklavc opened a pull request: https://github.com/apache/metron/pull/1157 METRON-1732: Fix job status liveness bug and parallelize finalizer file writing ## Contributor Comments This still needs to have the # of finalizer threads option exposed for the REST application, but since it's multi-threaded code I wanted to get the review process started while I finish that part up. Test plan and more detailed description to follow. ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mmiklavc/metron parallel-hdfs-write Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1157 commit 0b887fb64f9e4f2682e454e69b42b0b1014f3f4d Author: Michael Miklavcic Date: 2018-08-10T05:11:59Z Parallelize finalizer writing > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)