[jira] [Commented] (KAFKA-7276) Consider using re2j to speed up regex operations

2018-10-01 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634727#comment-16634727
 ] 

Ted Yu commented on KAFKA-7276:
---

For #1, see the following:
http://hadoop.apache.org/docs/r3.0.2/api/org/apache/hadoop/metrics2/filter/GlobFilter.html#compile-java.lang.String-

It is used by hadoop.



> Consider using re2j to speed up regex operations
> 
>
> Key: KAFKA-7276
> URL: https://issues.apache.org/jira/browse/KAFKA-7276
> Project: Kafka
>  Issue Type: Task
>  Components: packaging
>Reporter: Ted Yu
>Assignee: kevin.chen
>Priority: Major
>
> https://github.com/google/re2j
> re2j claims to do linear time regular expression matching in Java.
> Its benefit is most obvious for deeply nested regex (such as a | b | c | d).
> We should consider using re2j to speed up regex operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7276) Consider using re2j to speed up regex operations

2018-09-17 Thread John Roesler (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617636#comment-16617636
 ] 

John Roesler commented on KAFKA-7276:
-

Hi [~chenyuyun-emc] and [~yuzhih...@gmail.com],

The license on the project you linked seems not to be a standard one: 
[https://github.com/google/re2j/blob/master/LICENSE]

Before doing any software work, you would have to verify that its licence is 
compatible with ours.

Also, it's not clear whether you're talking about using this one the broker or 
client side.

On the broker side, we can be more flexible, but on the client side, we need to 
be extremely skeptical of new dependencies. Since our client code is a library 
that people pull in, we transitively expose them to all of our dependencies, 
setting them up for the Java equivalent of "DLL hell" if they happen to 
(transitively) depend on the same library at a different version.

As much as I like algorithmic efficiency, I would hesitate to bring in a change 
that introduces a new dependency unless there was a benchmark that shows a 
compelling performance improvement in production code.

Would you all consider pursuing these tasks in the following order:
 # verify that we are legally allowed to use this code, with respect to our 
mutual licenses
 # put together some experiments to determine what, if any, real performance 
improvement will result from this change

Thanks,

-John

> Consider using re2j to speed up regex operations
> 
>
> Key: KAFKA-7276
> URL: https://issues.apache.org/jira/browse/KAFKA-7276
> Project: Kafka
>  Issue Type: Task
>  Components: packaging
>Reporter: Ted Yu
>Assignee: kevin.chen
>Priority: Major
>
> https://github.com/google/re2j
> re2j claims to do linear time regular expression matching in Java.
> Its benefit is most obvious for deeply nested regex (such as a | b | c | d).
> We should consider using re2j to speed up regex operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)