Hi, you can refer to https://issues.apache.org/jira/browse/SPARK-14083 for more detail.
For performance issue,it is better to using the DataFrame than DataSet API. On Sat, Feb 25, 2017 at 2:45 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi Justin, > > I have never seen such a list. I think the area is in heavy development > esp. optimizations for typed operations. > > There's a JIRA to somehow find out more on the behavior of Scala code > (non-Column-based one from your list) but I've seen no activity in this > area. That's why for now Column-based untyped queries could be faster due > to more optimizations applied. Same about UDFs. > > Jacek > > On 23 Feb 2017 7:52 a.m., "Justin Pihony" <justin.pih...@gmail.com> wrote: > >> I was curious if there was introspection of certain typed functions and >> ran >> the following two queries: >> >> ds.where($"col" > 1).explain >> ds.filter(_.col > 1).explain >> >> And found that the typed function does NOT result in a PushedFilter. I >> imagine this is due to a limited view of the function, so I have two >> questions really: >> >> 1.) Is there a list of the methods that lose some of the optimizations >> that >> you get from non-functional methods? Is it any method that accepts a >> generic >> function? >> 2.) Is there any work to attempt reflection and gain some of these >> optimizations back? I couldn't find anything in JIRA. >> >> Thanks, >> Justin Pihony >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Is-there-a-list-of-missing-optimizatio >> ns-for-typed-functions-tp28418.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>