This means there is something wrong with your regex vs what Java supports.
Do you mean "(?:" rather than "(?" around where the error is? This is not
related to Spark.
On Wed, Dec 2, 2020 at 9:45 AM Sachit Murarka
wrote:
> Hi Sean,
>
> Thanks for quick response!
>
> I have tried with string
Hi Sean,
Thanks for quick response!
I have tried with string literal 'r' as a prefix that also gave an empty
result..
spark.sql(r"select regexp_extract('[11] [22]
[33]','(^\[OrderID:\s)?(?(1).*\]\s\[UniqueID:\s([a-z0-9A-Z]*)\].*|\[.*\]\s\[([a-z0-9A-Z]*)\].*)',1)
as anyid").show()
and as I
As in Java/Scala, in Python you'll need to escape the backslashes with \\.
"\[" means just "[" in a string. I think you could also prefix the string
literal with 'r' to disable Python's handling of escapes.
On Wed, Dec 2, 2020 at 9:34 AM Sachit Murarka
wrote:
> Hi All,
>
> I am using Pyspark to
Hi All,
I am using Pyspark to get the value from a column on basis of regex.
Following is the regex which I am using:
(^\[OrderID:\s)?(?(1).*\]\s\[UniqueID:\s([a-z0-9A-Z]*)\].*|\[.*\]\s\[([a-z0-9A-Z]*)\].*)
df = spark.createDataFrame([("[1234] [] [] [66]",),
("abcd",)],["stringValue"])