Re: Review Request 43970: [OOZIE-2457] Oozie log parsing regex consume more than 90% cpu

2017-04-06 Thread Robert Kanter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/#review171290
---


Ship it!




Ship It!

- Robert Kanter


On March 30, 2017, 8:45 p.m., Satish Saley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43970/
> ---
> 
> (Updated March 30, 2017, 8:45 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/OOZIE-2417
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417
> 
> 
> Repository: oozie-git
> 
> 
> Description
> ---
> 
> This patch tries to minimize the parsing of same log line for multiple times 
> using different regex.
> It also caches the log parts once we figure out what those are, thus avoiding 
> the re-parsing of log line to get log parts whenever needed.
> 
> 
> Diffs
> -
> 
>   core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
>   
> core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 
> 78cb042 
>   core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java 
> a676f4d 
>   core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 
> 
> 
> Diff: https://reviews.apache.org/r/43970/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Satish Saley
> 
>



Re: Review Request 43970: [OOZIE-2457] Oozie log parsing regex consume more than 90% cpu

2017-03-30 Thread Satish Saley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/
---

(Updated March 30, 2017, 1:45 p.m.)


Review request for oozie.


Bugs: https://issues.apache.org/jira/browse/OOZIE-2417

https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417


Repository: oozie-git


Description
---

This patch tries to minimize the parsing of same log line for multiple times 
using different regex.
It also caches the log parts once we figure out what those are, thus avoiding 
the re-parsing of log line to get log parts whenever needed.


Diffs (updated)
-

  core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
  core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 
78cb042 
  core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java 
a676f4d 
  core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 


Diff: https://reviews.apache.org/r/43970/diff/6/

Changes: https://reviews.apache.org/r/43970/diff/5-6/


Testing
---


Thanks,

Satish Saley



Re: Review Request 43970: [OOZIE-2457] Oozie log parsing regex consume more than 90% cpu

2017-03-29 Thread Robert Kanter


> On March 30, 2017, 1:39 a.m., Robert Kanter wrote:
> > core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java
> > Lines 194-214 (patched)
> > 
> >
> > This code is almost identical to the code above.  Is there any way to 
> > combine them?  Otherwise, it can be easy for these to diverge accidently in 
> > the future.

Actually, see my other comment about parseNextLine not being needed anymore.  
In that case, there's no duplication.


- Robert


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/#review170509
---


On Feb. 27, 2017, 7:42 p.m., Satish Saley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43970/
> ---
> 
> (Updated Feb. 27, 2017, 7:42 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/OOZIE-2417
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417
> 
> 
> Repository: oozie-git
> 
> 
> Description
> ---
> 
> This patch tries to minimize the parsing of same log line for multiple times 
> using different regex.
> It also caches the log parts once we figure out what those are, thus avoiding 
> the re-parsing of log line to get log parts whenever needed.
> 
> 
> Diffs
> -
> 
>   core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
>   
> core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 
> 78cb042 
>   core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java 
> a676f4d 
>   core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 
> 
> 
> Diff: https://reviews.apache.org/r/43970/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Satish Saley
> 
>



Re: Review Request 43970: [OOZIE-2457] Oozie log parsing regex consume more than 90% cpu

2017-02-27 Thread Satish Saley


> On Sept. 13, 2016, 2:35 p.m., Robert Kanter wrote:
> > core/src/main/java/org/apache/oozie/util/XLogFilter.java, line 213
> > 
> >
> > Shouldn't this be true?
> 
> Satish Saley wrote:
> I revisited this logic. I have renamed isSplit to isMatched, because it 
> made more sense. We will set isMatched to true only if the log line matches 
> the log filter pattern, otherwise we set it to false. If log line matches to 
> the log filter pattern, then we cut the message into three parts and recored 
> those parts in a list to avoid regex matching further in the code.

In the latest patch, I replaced this with an enum having 3 possible values. An 
enum is needed to distinguish between line not matched with pattern and a line 
not matched with pattern but should be included in log ( such as part of a 
stack trace).


- Satish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/#review148786
---


On Feb. 27, 2017, 11:42 a.m., Satish Saley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43970/
> ---
> 
> (Updated Feb. 27, 2017, 11:42 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/OOZIE-2417
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417
> 
> 
> Repository: oozie-git
> 
> 
> Description
> ---
> 
> This patch tries to minimize the parsing of same log line for multiple times 
> using different regex.
> It also caches the log parts once we figure out what those are, thus avoiding 
> the re-parsing of log line to get log parts whenever needed.
> 
> 
> Diffs
> -
> 
>   core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
>   
> core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 
> 78cb042 
>   core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java 
> a676f4d 
>   core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 
> 
> Diff: https://reviews.apache.org/r/43970/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Satish Saley
> 
>



Re: Review Request 43970: [OOZIE-2457] Oozie log parsing regex consume more than 90% cpu

2017-02-27 Thread Satish Saley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/
---

(Updated Feb. 27, 2017, 11:42 a.m.)


Review request for oozie.


Bugs: https://issues.apache.org/jira/browse/OOZIE-2417

https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417


Repository: oozie-git


Description
---

This patch tries to minimize the parsing of same log line for multiple times 
using different regex.
It also caches the log parts once we figure out what those are, thus avoiding 
the re-parsing of log line to get log parts whenever needed.


Diffs (updated)
-

  core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
  core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 
78cb042 
  core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java 
a676f4d 
  core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 

Diff: https://reviews.apache.org/r/43970/diff/


Testing
---


Thanks,

Satish Saley