Re: Regexp_extract not giving correct output

2020-12-02 Thread Sean Owen
This means there is something wrong with your regex vs what Java supports. Do you mean "(?:" rather than "(?" around where the error is? This is not related to Spark. On Wed, Dec 2, 2020 at 9:45 AM Sachit Murarka wrote: > Hi Sean, > > Thanks for quick response! > > I have tried with string

Re: Regexp_extract not giving correct output

2020-12-02 Thread Sachit Murarka
Hi Sean, Thanks for quick response! I have tried with string literal 'r' as a prefix that also gave an empty result.. spark.sql(r"select regexp_extract('[11] [22] [33]','(^\[OrderID:\s)?(?(1).*\]\s\[UniqueID:\s([a-z0-9A-Z]*)\].*|\[.*\]\s\[([a-z0-9A-Z]*)\].*)',1) as anyid").show() and as I

Re: Regexp_extract not giving correct output

2020-12-02 Thread Sean Owen
As in Java/Scala, in Python you'll need to escape the backslashes with \\. "\[" means just "[" in a string. I think you could also prefix the string literal with 'r' to disable Python's handling of escapes. On Wed, Dec 2, 2020 at 9:34 AM Sachit Murarka wrote: > Hi All, > > I am using Pyspark to

Regexp_extract not giving correct output

2020-12-02 Thread Sachit Murarka
Hi All, I am using Pyspark to get the value from a column on basis of regex. Following is the regex which I am using: (^\[OrderID:\s)?(?(1).*\]\s\[UniqueID:\s([a-z0-9A-Z]*)\].*|\[.*\]\s\[([a-z0-9A-Z]*)\].*) df = spark.createDataFrame([("[1234] [] [] [66]",), ("abcd",)],["stringValue"])