[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-04-21 Thread via GitHub


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1517460306

   I would like to inquire whether the patch I submitted is eligible for 
merging into the codebase.  I understand that there may be concerns or issues 
that need to be addressed before the patch can be merged. If there are any 
concerns or questions regarding the patch, please feel free to share them with 
me so that I can address them accordingly.
   
   Thank you for your consideration.
   @LuciferYang  @cloud-fan @SparksFyz @dongjoon-hyun @jaceklaskowski 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-04-17 Thread via GitHub


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1511047557

   > https://user-images.githubusercontent.com/8748814/204439049-53f0bd4f-9ea0-4289-8268-d16aef5b4334.png";>
   > 
   > @lyy-pineapple Would you share the test sql pattern? I test some cases and 
haven't seen such improvement
   
   Now, much cases that I test had much optimize. Did you test some case like` 
like "%a%",`that maybe convert `StartsWith`  ,`Contain` or others


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-04-17 Thread via GitHub


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1511045850

   > https://user-images.githubusercontent.com/8748814/204439049-53f0bd4f-9ea0-4289-8268-d16aef5b4334.png";>
   > 
   > @lyy-pineapple Would you share the test sql pattern? I test some cases and 
haven't seen such improvement
   
   Now, much cases that I test had much optimize. Did you test some case like   
`like "%a%`,that maybe convert `StartsWith ` ,`Contain` or others 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-04-13 Thread via GitHub


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1507832785

   > Any new developments in this PR?
   
   Reoptimized unit testing to facilitate comparison of results between two 
regularization engines


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-03-30 Thread via GitHub


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1489987177

   > https://user-images.githubusercontent.com/8748814/204439049-53f0bd4f-9ea0-4289-8268-d16aef5b4334.png";>
   > 
   > @lyy-pineapple Would you share the test sql pattern? I test some cases and 
haven't seen such improvement
   
   Could you share that case that was not improve? thanks~.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-03-30 Thread via GitHub


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1489985307

   > `joni` seems to be used in Hbase client only instead of Hbase server or 
Hbase common.
   > 
   > * https://mvnrepository.com/artifact/org.apache.hbase/hbase-client/2.5.3
   > 
   > In addition, Trino is not using `jruby/joni`. It uses `airlift/joni`, a 
wrapper of `Joni`. 
https://github.com/trinodb/trino/blob/4cabec97ff62567d6bc8bcc40786eb0ac36b65ff/pom.xml#L910
   > 
   > ```
   > 
   > io.airlift
   > joni
   > 2.1.5.3
   > 
   > ```
   > 
   > Given that, this seems to be used rarely. Do you think there is a reason?
   
   airlift/joni is fork form joni and fix some issue, joni also fix and other 
issue.  And joni is also dependented by other project 
(https://github.com/jruby/joni/network/dependents) .Joni meaning java pattern 
compatible with java matching rules. I optimized the unit test to more 
intuitively observe the correctness of the comparison between joni and java 
results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-12-26 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1365029419

   > https://user-images.githubusercontent.com/8748814/204439049-53f0bd4f-9ea0-4289-8268-d16aef5b4334.png";>
   > 
   > @lyy-pineapple Would you share the test sql pattern? I test some cases and 
haven't seen such improvement
   
   test a simple sql 
   >  select id,
   case when t1 rlike '.*abc.*cde.*' then 1 
   when t1 rlike '.*bbd*cde.*' then 2 
   when t1 rlike '.*cbe*cde.*' then 3
when   t1 rlike '.*dbf*cde.*'  then 4
   when t1 rlike  '.*ebg*cde.*' then 5 
   else 0
   end  as t1
   from xxx
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-11-01 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1299657792

   > How much confidence do we have in joni? Is it widely adopted by other 
open-source projects? I'm a bit concerned about moving away from JDK regex and 
picking a project that I just heard about.
   > 
   > also cc @HyukjinKwon @dongjoon-hyun @viirya
   
   I kown that hbase and trino also used joni regex. And add new configure to 
choose java or joni to ensure stability.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-11-01 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1299505071

   Add new benchmark that compared with java 11 and java 17 . cc @cloud-fan  
@LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-10-13 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1277442557

   Does spark has some data that is suitable for regular matching benchmark. 
@LuciferYang @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-10-11 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1274563298

   > sql test is better, but simple test is OK
   
   Hi,I did two benchmark by simple data and 
https://github.com/mariomka/regex-benchmark/blob/master/input-text.txt.cc 
@LuciferYang 
   Is it necessary to keep both of them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-10-10 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1272891669

   > sql test is better, but simple test is OK
   
   
![image](https://user-images.githubusercontent.com/46274164/194816709-980e5062-2d05-4e95-b0bc-d83e37a86555.png)
   Can I add this test to Spark? This requires adding a large test file 
https://github.com/mariomka/regex-benchmark/blob/master/input-text.txt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-10-09 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1272834965

   > Can you also add a related micro-benchmark for Spark?
   
   If I use SqlBasedBenchmark to test, I don't know how to create a dataset and 
override regular matching rules. Do you have any suggestions? Or just a simple 
test?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-10-09 Thread GitBox


lyy-pineapple commented on PR #38171:
URL: https://github.com/apache/spark/pull/38171#issuecomment-1272746644

   > @lyy-pineapple please run `./dev/test-dependencies.sh --replace-manifest` 
locally and add the changed `spark-deps-hadoop-x-hive-2.3` files to this pr
   
   Thanks, i has done it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org