Re: What is the difference for the following UDFs?

2019-05-14 Thread Qian He
Hi Jacek, Thanks for your reply. Your provided case was actually same as my second option in my original email. What I'm wondering was the difference between those two regarding query performance or efficiency. On Tue, May 14, 2019 at 3:51 PM Jacek Laskowski wrote: > Hi, > > For this

Re: What is the difference for the following UDFs?

2019-05-14 Thread Jacek Laskowski
Hi, For this particular case I'd use Column.substr ( http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column), e.g. val ns = Seq(("hello world", 1, 5)).toDF("w", "b", "e") scala> ns.select($"w".substr($"b", $"e" - $"b" + 1) as "demo").show +-+ | demo| +-+

What is the difference for the following UDFs?

2019-05-14 Thread Qian He
For example, I have a dataframe with 3 columns: URL, START, END. For each url from URL column, I want to fetch a substring of it starting from START and ending at END. ++--+-+ |URL|START |END | ++--+-+