ulysses-you commented on pull request #26875:
URL: https://github.com/apache/spark/pull/26875#issuecomment-646699595


   Env: centos 7, 40cores, 4GB
   
   ---- test 1 ----
   ```
   val df1 = spark.range(0, 20000, 1, 200).selectExpr("uuid() as c1")
   val df2 = spark.range(0, 20000, 1, 200).selectExpr("uuid() as c2")
   val start = System.currentTimeMillis
   df1.join(df2).where("c2 like c1").count()
   // 3 times test
   // before  159228, 157541, 157721
   // after   14378,  11545,  11498
   println(System.currentTimeMillis - start)
   ```
   ---- test2 ----
   ```
   // 17+1 length stirngs
   val df1 = spark.range(0, 20000, 1, 
200).selectExpr("concat('aaaaaaaaaaaaaaaaa', id%2) as c1")
   val df2 = spark.range(0, 20000, 1, 
200).selectExpr("concat('bbbbbbbbbbbbbbbbb', id%2) as c2")
   val start = System.currentTimeMillis
   df1.join(df2).where("c2 like c1").count()
   // 3 times test
   // before  90054, 90350, 90283
   // after   13077, 10097, 9770
   println(System.currentTimeMillis - start)
   ```
   
   About 10x time performance improvement. Seems equals is more quickly than 
compile pattern. And longer strings would make performance improvement better.
   cc @HyukjinKwon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to