[ 
https://issues.apache.org/jira/browse/CALCITE-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818220#comment-16818220
 ] 

Valeriy Trofimov edited comment on CALCITE-2974 at 4/15/19 5:54 PM:
--------------------------------------------------------------------

Doing a quick non-scientific perf measure by running a unit test like this:

 
{code:java}
@Test public void testCastTimestampJIRA2974() {
  // Test cast for date/time/timestamp
  tester.setFor(SqlStdOperatorTable.CAST);

  tester.checkScalar(
      "cast('1945-02-24 12:42:25.34' as TIMESTAMP)",
      "1945-02-24 12:42:25",
      "TIMESTAMP(0) NOT NULL");
}
{code}
On a RexSimplify.simplifyCast part of the code like this:  
{code:java}
long t1 = java.lang.System.nanoTime();

// current code:
//executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);

// proposed code:
// if converting from CHAR to TIMESTAMP, try to convert manually to improve perf
boolean isFromChar = typeName == SqlTypeName.CHAR;
boolean isToTimeStamp = e.getType().getSqlTypeName() == SqlTypeName.TIMESTAMP;
if (isFromChar) { // if converting from CHAR...
  String toConvert = ((NlsString) literal.getValue()).getValue();
  if (isToTimeStamp) { // ...to TIMESTAMP
      try {
        Long converted = DateTimeUtils.timestampStringToUnixDate(toConvert);
        reducedValues.add(rexBuilder.makeLiteral(converted, e.getType(), true));
      } catch (Exception e2) {
      }
  }
}
if (reducedValues.isEmpty()) { // if manual conversion failed, use reduce() 
method
  executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
}

long t2 = java.lang.System.nanoTime();
System.out.println(t2 - t1);
{code}
Shows (t2 - t1) value of 1376990. If you comment out the proposed code and 
uncomment the current code you will see the (t2 - t1) value of 323713768. It's 
a 235-fold perf difference between current and proposed code. As Danny 
mentioned, this is not the only code that gets run during query planning, but 
it gets called a lot in our case, so the numbers add up. It's possible that we 
have a bug in our code that calls it too many times - we'll look into that. But 
we want to make our product as high-perf as possible, so we welcome any perf 
improvements like this.

 


was (Author: valtroffuture):
Doing a quick non-scientific perf measure by running a unit test like this:

 
{code:java}
@Test public void testCastTimestampJIRA2974() {
  // Test cast for date/time/timestamp
  tester.setFor(SqlStdOperatorTable.CAST);

  tester.checkScalar(
      "cast('1945-02-24 12:42:25.34' as TIMESTAMP)",
      "1945-02-24 12:42:25",
      "TIMESTAMP(0) NOT NULL");
}
{code}
On a RexSimplify.simplifyCast part of the code like this:

 

 
{code:java}
// current code:
//executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);

// proposed code:
// if converting from CHAR to TIMESTAMP, try to convert manually to improve perf
boolean isFromChar = typeName == SqlTypeName.CHAR;
boolean isToTimeStamp = e.getType().getSqlTypeName() == SqlTypeName.TIMESTAMP;
if (isFromChar) { // if converting from CHAR...
  String toConvert = ((NlsString) literal.getValue()).getValue();
  if (isToTimeStamp) { // ...to TIMESTAMP
      try {
        Long converted = DateTimeUtils.timestampStringToUnixDate(toConvert);
        reducedValues.add(rexBuilder.makeLiteral(converted, e.getType(), true));
      } catch (Exception e2) {
      }
  }
}
if (reducedValues.isEmpty()) { // if manual conversion failed, use reduce() 
method
  executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
}

long t2 = java.lang.System.nanoTime();
System.out.println(t2 - t1);
{code}
Shows (t2 - t1) value of 1376990. If you comment out the proposed code and 
uncomment the current code you will see the (t2 - t1) value of 323713768. It's 
a 235-fold perf difference between current and proposed code. As Danny 
mentioned, this is not the only code that gets run during query planning, but 
it gets called a lot in our case, so the numbers add up. It's possible that we 
have a bug in our code that calls it too many times - we'll look into that. But 
we want to make our product as high-perf as possible, so we welcome any perf 
improvements like this.

 

> Timestamp conversion performance can be improved
> ------------------------------------------------
>
>                 Key: CALCITE-2974
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2974
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Valeriy Trofimov
>            Priority: Major
>              Labels: easyfix, performance
>             Fix For: next
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> RexSimplify.simplifyCast() is slow when converting a string to SQL Timestamp 
> value. The slowness is caused by this line:
> {code:java}
> executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> {code}
> Debugging this code line with my team showed that for timestamp conversion it 
> loads a pre-compiled conversion code, which makes it slow. My team proposes 
> to replace this line with the following code that greately improves 
> performance:
> {code:java}
> if (typeName == SqlTypeName.CHAR && e.getType().getSqlTypeName() == 
> SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE) {
>     if (literal.getValue() instanceof NlsString) {
>         String timestampStr = ((NlsString) literal.getValue()).getValue();
>         Long timestampLong = 
> SqlFunctions.toTimestampWithLocalTimeZone(timestampStr);
>         reducedValues.add(rexBuilder.makeLiteral(timestampLong, e.getType(), 
> true));
>     }
> }
> if (reducedValues.isEmpty()) {
>     executor.reduce(rexBuilder, ImmutableList.of(e), reducedValues);
> }
> {code}
> Let us know if we can submit a pull request with this code change or if we've 
> missed anything. Do you know the reason behind using a pre-compiled code for 
> timestamp conversion?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to