subject:"\[jira\] \[Commented\] \(OOZIE\-2457\) Oozie log parsing regex consume more than 90% cpu"

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-04-07 Thread Satish Subhashrao Saley (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961335#comment-15961335
 ] 

Satish Subhashrao Saley commented on OOZIE-2457:


Thanks Robert for review. Committed to master.

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, 
> OOZIE-2457-6.patch, OOZIE-2457-7.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-04-06 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960025#comment-15960025
 ] 

Robert Kanter commented on OOZIE-2457:
--

+1

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, 
> OOZIE-2457-6.patch, OOZIE-2457-7.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-03-30 Thread Satish Subhashrao Saley (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950059#comment-15950059
 ] 

Satish Subhashrao Saley commented on OOZIE-2457:


Hi [~rkanter], Thanks for the review. I don't think we need to add extra test 
cases, as this fix only changes the way we parse the logs not adding any 
additional functionalities. 

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, 
> OOZIE-2457-6.patch, OOZIE-2457-7.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-03-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950034#comment-15950034
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
. {color:green}+1{color} There are no new bugs found in [server].
. {color:green}+1{color} There are no new bugs found in [client].
. {color:green}+1{color} There are no new bugs found in [core].
. {color:green}+1{color} There are no new bugs found in [docs].
. {color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
. {color:green}+1{color} There are no new bugs found in [tools].
. {color:green}+1{color} There are no new bugs found in [examples].
. {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
. {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
. {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
. {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
. {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive].
. {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
. {color:green}+1{color} There are no new bugs found in [sharelib/pig].
. {color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1906
.Tests rerun: 16
.Tests failed at first run: 
org.apache.oozie.action.TestActionFailover,org.apache.oozie.jms.TestJMSJobEventListener,
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/oozie-trunk-precommit-build/3755/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, 
> OOZIE-2457-6.patch, OOZIE-2457-7.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
>

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-03-29 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948245#comment-15948245
 ] 

Robert Kanter commented on OOZIE-2457:
--

[~satishsaley], sorry for taking so long to do another review.  It mostly looks 
good to me.  I've left a few trivial things on ReviewBoard.  

I was also wondering if you think we need any additional unit tests for the 
changes?  I know we have a bunch of existing tests already, and them passing is 
a good sign.

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, OOZIE-2457-6.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-03-29 Thread Jan Filipiak (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947614#comment-15947614
 ] 

Jan Filipiak commented on OOZIE-2457:
-

Hi, are there any plans to change the Logging implementation entirely? Maybe 
keep a log per Bundle/Coord/Job? would a patch be welcome to maybe store those 
logs in hdfs with configurable retention times?

Don't get me wrong but I just ran into this issue today and couldn't really 
believe it.

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, OOZIE-2457-6.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-02-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887354#comment-15887354
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1886
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3664/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, OOZIE-2457-6.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-02-27 Thread Satish Subhashrao Saley (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887210#comment-15887210
 ] 

Satish Subhashrao Saley commented on OOZIE-2457:


Hi [~rkanter], could you please review my latest patch? I have addressed your 
comments from review board.

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch, OOZIE-2457-6.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-02-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886564#comment-15886564
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:red}-1{color} the patch contains 1 line(s) with trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1886
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3663/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch, OOZIE-2457-5.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
>

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2017-02-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883438#comment-15883438
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:red}-1{color} the patch contains 1 line(s) with trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1879
.Tests failed: 4
.Tests errors: 0

.The patch failed the following testcases:

.  
testProcessRemainingLog(org.apache.oozie.util.TestTimestampedMessageParser)
.  testFsFailover(org.apache.oozie.action.TestActionFailover)
.  testloglevel_Error(org.apache.oozie.util.TestXLogUserFilterParam)
.  testConnectionDrop(org.apache.oozie.jms.TestJMSJobEventListener)

.Tests failing with errors:
.  

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3659/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Blocker
> Fix For: 5.0.0
>
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch, OOZIE-2457-4.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
>

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2016-07-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396842#comment-15396842
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:red}-1 COMPILE{color}
.{color:red}-1{color} HEAD does not compile
.{color:red}-1{color} patch does not compile
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color} - patch does not compile, cannot run testcases
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3100/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2016-06-16 Thread Satish Subhashrao Saley (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335024#comment-15335024
 ] 

Satish Subhashrao Saley commented on OOZIE-2457:


Hello [~rkanter], I have addressed your comments and updated the patch. Could 
you please take a look?

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2016-06-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331427#comment-15331427
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1787
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2992/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, 
> OOZIE-2457-3.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2016-02-24 Thread Satish Subhashrao Saley (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166333#comment-15166333
 ] 

Satish Subhashrao Saley commented on OOZIE-2457:


[Review link|https://reviews.apache.org/r/43970/]

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2016-02-24 Thread Purshotam Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15165918#comment-15165918
 ] 

Purshotam Shah commented on OOZIE-2457:
---

Can you put the patch to review board?

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2016-02-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154087#comment-15154087
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:red}-1{color} the patch contains 2 line(s) with trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1762
.Tests failed: 3
.Tests errors: 0

.The patch failed the following testcases:

.  
testbulkWfKillSuccess(org.apache.oozie.command.wf.TestBulkWorkflowXCommand)
.  testForNoDuplicates(org.apache.oozie.event.TestEventGeneration)
.  testSamplers(org.apache.oozie.util.TestMetricsInstrumentation)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2750/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

2016-02-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152925#comment-15152925
 ] 

Hadoop QA commented on OOZIE-2457:
--

Testing JIRA OOZIE-2457

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:red}-1{color} the patch contains 2 line(s) with trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1761
.Tests failed: 2
.Tests errors: 2

.The patch failed the following testcases:

.  testForNoDuplicates(org.apache.oozie.event.TestEventGeneration)
.  testSamplers(org.apache.oozie.util.TestMetricsInstrumentation)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2748/

> Oozie log parsing regex consume more than 90% cpu
> -
>
> Key: OOZIE-2457
> URL: https://issues.apache.org/jira/browse/OOZIE-2457
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
>Priority: Minor
> Attachments: OOZIE-2457-1.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  
> USER_TIME=1990 (92.46%) Allocted: 269156584
> java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
> java.util.regex.Pattern$Curly.match(Pattern.java:4132)
> java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
> java.util.regex.Matcher.match(Matcher.java:1221)
> java.util.regex.Matcher.matches(Matcher.java:559)
> org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
> 
> org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
> 
> org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] 
> JOB\[000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList splitLogMessage(String logLine) {
> Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
> if (splitter.matches()) {
> ArrayList logParts = new ArrayList();
> logParts.add(splitter.group(1));// timestamp
> logParts.add(splitter.group(2));// log level
> logParts.add(splitter.group(3));// Log Message
> return logParts;
> }
> else {
> return null;
> }
> }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList logParts) {
> if (getStartDate() != null) {
> if (logParts.get(0).substring(0, 
> 19).compareTo(getFormattedStartDate()) < 0) {
> return false;
> }
> }
> String logLevel = logParts.get(1);
> String logMessage = logParts.get(2);
> if (this.logLevels == null || 
> this.logLevels.containsKey(logLevel.toUpperCase())) {
> Matcher logMatcher = filterPattern.matcher(logMessage);
> return logMatcher.matches();
> }
> else {
> return false;
> }
> }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
> String timestamp = null;
> ArrayList logParts = filter.splitLogMessage(line);
> if (logParts != null) {
> timestamp = logParts.get(0);
> }
> return timestamp;
> }
> {code}
> where the {{line}} has already parsed using regex and we already know the 
> {{logParts}} if any.



--
This

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

17 matches

Site Navigation

Mail list logo

Footer information