[ https://issues.apache.org/jira/browse/SPARK-30763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jiaan.geng updated SPARK-30763: ------------------------------- Description: The current implement of regexp_extract will throws a unprocessed exception show below: SELECT regexp_extract('1a 2b 14m', ' d+') {code:java} [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 22.0 failed 1 times, most recent failure: Lost task 1.0 in stage 22.0 (TID 33, 192.168.1.6, executor driver): java.lang.IndexOutOfBoundsException: No group 1 [info] at java.util.regex.Matcher.group(Matcher.java:538) [info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) [info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) [info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) [info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804) [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227) [info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227) [info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2156) [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) [info] at org.apache.spark.scheduler.Task.run(Task.scala:127) [info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) {code} I think should treat this exception well. was: The current implement of regexp_extract will throws a unprocessed exception show below: SELECT regexp_extract('1a 2b 14m', '\\d+') {code:java} [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 22.0 failed 1 times, most recent failure: Lost task 1.0 in stage 22.0 (TID 33, 192.168.1.6, executor driver): java.lang.IndexOutOfBoundsException: No group 2 [info] at java.util.regex.Matcher.group(Matcher.java:538) [info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) [info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) [info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) [info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804) [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227) [info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227) [info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2156) [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) [info] at org.apache.spark.scheduler.Task.run(Task.scala:127) [info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) {code} I think should treat this exception well. > Fix java.lang.IndexOutOfBoundsException No group 1 for regexp_extract > --------------------------------------------------------------------- > > Key: SPARK-30763 > URL: https://issues.apache.org/jira/browse/SPARK-30763 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.5, 3.0.0 > Reporter: jiaan.geng > Priority: Major > > The current implement of regexp_extract will throws a unprocessed exception > show below: > SELECT regexp_extract('1a 2b 14m', ' > d+') > > {code:java} > [info] org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 22.0 failed 1 times, most recent failure: Lost task 1.0 in > stage 22.0 (TID 33, 192.168.1.6, executor driver): > java.lang.IndexOutOfBoundsException: No group 1 > [info] at java.util.regex.Matcher.group(Matcher.java:538) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > [info] at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > [info] at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) > [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > [info] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) > [info] at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1804) > [info] at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1227) > [info] at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1227) > [info] at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2156) > [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > [info] at org.apache.spark.scheduler.Task.run(Task.scala:127) > [info] at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) > [info] at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [info] at java.lang.Thread.run(Thread.java:748) > {code} > > I think should treat this exception well. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org