subject:"What is the difference for the following UDFs\?"

Re: What is the difference for the following UDFs?

2019-05-14 Thread Qian He

Hi Jacek, Thanks for your reply. Your provided case was actually same as my second option in my original email. What I'm wondering was the difference between those two regarding query performance or efficiency. On Tue, May 14, 2019 at 3:51 PM Jacek Laskowski wrote: > Hi, > > For this

Re: What is the difference for the following UDFs?

2019-05-14 Thread Jacek Laskowski

Hi, For this particular case I'd use Column.substr ( http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column), e.g. val ns = Seq(("hello world", 1, 5)).toDF("w", "b", "e") scala> ns.select($"w".substr($"b", $"e" - $"b" + 1) as "demo").show +-+ | demo| +-+

What is the difference for the following UDFs?

2019-05-14 Thread Qian He

For example, I have a dataframe with 3 columns: URL, START, END. For each url from URL column, I want to fetch a substring of it starting from START and ending at END. ++--+-+ |URL|START |END | ++--+-+