[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-10-20 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965649#comment-14965649
 ] 

Ufuk Celebi commented on FLINK-2882:


Regarding {{toShortString}}: This is exclusively used in the network stack and 
there it's only used for debugging as far as I know, as part of the 
{{toString}} methods of result partitions, because they require two IDs to 
identify their source/target (which gets very long even as hex strings). Can 
you tell what fraction of calls come from {{toShortString}}? We can also think 
about removing that variant as it was mostly useful in the early days during 
debugging the network stack.

In general, I'm wondering why the ID toString methods are called so often. Can 
you give the top 10 stack traces leading to it or so? And what LOG level you 
are using?

In any case, both your suggestions sound reasonable to me.

> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-10-21 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966670#comment-14966670
 ] 

Stephan Ewen commented on FLINK-2882:
-

Interesting find!

I am curious, how often to we need to call this method, actually? When we 
transfer IDs, we do not convert them to strings. So is this all done for the 
sake of logging?

If so, is the logging by itself also becoming a bottleneck when increasing the 
DOP?

> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057957#comment-15057957
 ] 

ASF GitHub Bot commented on FLINK-2882:
---

GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/1455

[FLINK-2882] [core] Improve performance of string conversions

Memoize string representations of AbstractID.

Use lookup table for byte-to-hex conversion in StringUtils.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 
2882_improve_performance_of_string_conversions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1455


commit e23322855f94245783d7f0a5accb1558d2816152
Author: Greg Hogan 
Date:   2015-12-14T19:47:33Z

[FLINK-2882] [core] Improve performance of string conversions

Memoize string representations of AbstractID.

Use lookup table for byte-to-hex conversion in StringUtils.




> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058169#comment-15058169
 ] 

ASF GitHub Bot commented on FLINK-2882:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1455#issuecomment-164790091
  
Thanks for your contribution @greghogan. Efficiency improvements look 
really good. The test failures seem to be unrelated.

Would be great if you could add a small test for the StringUtils method.

Apart from that, +1 for merging.


> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058174#comment-15058174
 ] 

ASF GitHub Bot commented on FLINK-2882:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1455#issuecomment-164790880
  
Looks very good.

I am wondering if we can get rid of the toShortString method. What is that 
one actually used for?


> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058223#comment-15058223
 ] 

ASF GitHub Bot commented on FLINK-2882:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1455#issuecomment-164802103
  
Yes, that's good to remove. I pointed that out as well when Greg opened the 
initial issue.

We can either do it as part of this PR or I can do it as a follow up.


> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058304#comment-15058304
 ] 

ASF GitHub Bot commented on FLINK-2882:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1455#issuecomment-164824368
  
@tillrohrmann There exists StringUtilsTest with small test.

One of the TravisCI errors was a timeout and the other killed by the 
watchdog. Are timeouts being reworked to use RetryOnException or similar?

@uce It looks like toShortString is used for more than debugging and tests 
so I'd go for a separate PR.


> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058690#comment-15058690
 ] 

ASF GitHub Bot commented on FLINK-2882:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1455#issuecomment-164880926
  
Looks good. 1+ for removing `toShortString` in a separate issue.
I'll merge this PR.


> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2882) Improve performance of string conversions

2015-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058749#comment-15058749
 ] 

ASF GitHub Bot commented on FLINK-2882:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1455


> Improve performance of string conversions
> -
>
> Key: FLINK-2882
> URL: https://issues.apache.org/jira/browse/FLINK-2882
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.10.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> {{AbstractID.toString()}} and {{AbstractID.toShortString()}} call 
> {{StringUtils.byteToHexString(...)}} which uses a StringBuilder to convert 
> from binary to hex. This is a hotspot when scaling the number of workers.
> While testing on my single node with parallelism=512 jvisualvm reports 
> 600,000 calls taking 13.4 seconds. Improving 
> {{StringUtils.byteToHexString(...)}} reduces the time to 1.3 seconds. 
> Additionally memoizing the string values in {{AbstractID}} reduce the time to 
> 350 ms and the number of calls to {{StringUtils.byteToHexString(...)}} to 
> ~1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)