[ https://issues.apache.org/jira/browse/CALCITE-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818220#comment-16818220 ]
Valeriy Trofimov edited comment on CALCITE-2974 at 4/15/19 5:54 PM: -------------------------------------------------------------------- Doing a quick non-scientific perf measure by running a unit test like this: {code:java} @Test public void testCastTimestampJIRA2974() { // Test cast for date/time/timestamp tester.setFor(SqlStdOperatorTable.CAST); tester.checkScalar( "cast('1945-02-24 12:42:25.34' as TIMESTAMP)", "1945-02-24 12:42:25", "TIMESTAMP(0) NOT NULL"); } {code} On a RexSimplify.simplifyCast part of the code like this: {code:java} long t1 = java.lang.System.nanoTime(); // current code: //executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues); // proposed code: // if converting from CHAR to TIMESTAMP, try to convert manually to improve perf boolean isFromChar = typeName == SqlTypeName.CHAR; boolean isToTimeStamp = e.getType().getSqlTypeName() == SqlTypeName.TIMESTAMP; if (isFromChar) { // if converting from CHAR... String toConvert = ((NlsString) literal.getValue()).getValue(); if (isToTimeStamp) { // ...to TIMESTAMP try { Long converted = DateTimeUtils.timestampStringToUnixDate(toConvert); reducedValues.add(rexBuilder.makeLiteral(converted, e.getType(), true)); } catch (Exception e2) { } } } if (reducedValues.isEmpty()) { // if manual conversion failed, use reduce() method executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues); } long t2 = java.lang.System.nanoTime(); System.out.println(t2 - t1); {code} Shows (t2 - t1) value of 1376990. If you comment out the proposed code and uncomment the current code you will see the (t2 - t1) value of 323713768. It's a 235-fold perf difference between current and proposed code. As Danny mentioned, this is not the only code that gets run during query planning, but it gets called a lot in our case, so the numbers add up. It's possible that we have a bug in our code that calls it too many times - we'll look into that. But we want to make our product as high-perf as possible, so we welcome any perf improvements like this. was (Author: valtroffuture): Doing a quick non-scientific perf measure by running a unit test like this: {code:java} @Test public void testCastTimestampJIRA2974() { // Test cast for date/time/timestamp tester.setFor(SqlStdOperatorTable.CAST); tester.checkScalar( "cast('1945-02-24 12:42:25.34' as TIMESTAMP)", "1945-02-24 12:42:25", "TIMESTAMP(0) NOT NULL"); } {code} On a RexSimplify.simplifyCast part of the code like this: {code:java} // current code: //executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues); // proposed code: // if converting from CHAR to TIMESTAMP, try to convert manually to improve perf boolean isFromChar = typeName == SqlTypeName.CHAR; boolean isToTimeStamp = e.getType().getSqlTypeName() == SqlTypeName.TIMESTAMP; if (isFromChar) { // if converting from CHAR... String toConvert = ((NlsString) literal.getValue()).getValue(); if (isToTimeStamp) { // ...to TIMESTAMP try { Long converted = DateTimeUtils.timestampStringToUnixDate(toConvert); reducedValues.add(rexBuilder.makeLiteral(converted, e.getType(), true)); } catch (Exception e2) { } } } if (reducedValues.isEmpty()) { // if manual conversion failed, use reduce() method executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues); } long t2 = java.lang.System.nanoTime(); System.out.println(t2 - t1); {code} Shows (t2 - t1) value of 1376990. If you comment out the proposed code and uncomment the current code you will see the (t2 - t1) value of 323713768. It's a 235-fold perf difference between current and proposed code. As Danny mentioned, this is not the only code that gets run during query planning, but it gets called a lot in our case, so the numbers add up. It's possible that we have a bug in our code that calls it too many times - we'll look into that. But we want to make our product as high-perf as possible, so we welcome any perf improvements like this. > Timestamp conversion performance can be improved > ------------------------------------------------ > > Key: CALCITE-2974 > URL: https://issues.apache.org/jira/browse/CALCITE-2974 > Project: Calcite > Issue Type: Improvement > Components: core > Affects Versions: 1.19.0 > Reporter: Valeriy Trofimov > Priority: Major > Labels: easyfix, performance > Fix For: next > > Original Estimate: 24h > Remaining Estimate: 24h > > RexSimplify.simplifyCast() is slow when converting a string to SQL Timestamp > value. The slowness is caused by this line: > {code:java} > executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues); > {code} > Debugging this code line with my team showed that for timestamp conversion it > loads a pre-compiled conversion code, which makes it slow. My team proposes > to replace this line with the following code that greately improves > performance: > {code:java} > if (typeName == SqlTypeName.CHAR && e.getType().getSqlTypeName() == > SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE) { > if (literal.getValue() instanceof NlsString) { > String timestampStr = ((NlsString) literal.getValue()).getValue(); > Long timestampLong = > SqlFunctions.toTimestampWithLocalTimeZone(timestampStr); > reducedValues.add(rexBuilder.makeLiteral(timestampLong, e.getType(), > true)); > } > } > if (reducedValues.isEmpty()) { > executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues); > } > {code} > Let us know if we can submit a pull request with this code change or if we've > missed anything. Do you know the reason behind using a pre-compiled code for > timestamp conversion? -- This message was sent by Atlassian JIRA (v7.6.3#76005)