Re: createDataFrame allows column names as second param in Python not in Scala
We can't drop the existing createDataFrame one, since it breaks API compatibility, and the existing one also automatically infers the column name for case classes (in that case users most likely won't be declaring names directly). If this is really a problem, we should just create a new function (maybe more than one, since you could argue the one for Seq should also have that ...). On Sun, May 3, 2015 at 2:13 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: I have the perfect counter example where some of the data scientists prototype in Python and the production materials is done in Scala. But I get your point, as a matter of fact I realised the toDF method took parameters a little while after posting this. However the toDF still needs you to go from a List to an RDD, or create a useless Dataframe and call toDF on it re-creating a complete data structure. I just feel that the createDataFrame(_: Seq) is not really useful as it is, because I think there are practically no circumstances where you'd want to create a DataFrame without column names. I'm not implying a n-th overloaded method should be created, rather than change the signature of the existing method with an optional Seq of column names. Regards, Olivier. Le dim. 3 mai 2015 à 07:44, Reynold Xin r...@databricks.com a écrit : Part of the reason is that it is really easy to just call toDF on Scala, and we already have a lot of createDataFrame functions. (You might find some of the cross-language differences confusing, but I'd argue most real users just stick to one language, and developers or trainers are the only ones that need to constantly switch between languages). On Sat, May 2, 2015 at 11:05 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, SQLContext.createDataFrame has different behaviour in Scala or Python : l = [('Alice', 1)] sqlContext.createDataFrame(l).collect() [Row(_1=u'Alice', _2=1)] sqlContext.createDataFrame(l, ['name', 'age']).collect() [Row(name=u'Alice', age=1)] and in Scala : scala val data = List((Alice, 1), (Wonderland, 0)) scala sqlContext.createDataFrame(data, List(name, score)) console:28: error: overloaded method value createDataFrame with alternatives: ... cannot be applied to ... What do you think about allowing in Scala too to have a Seq of column names for the sake of consistency ? Regards, Olivier.
Re: createDataFrame allows column names as second param in Python not in Scala
I have the perfect counter example where some of the data scientists prototype in Python and the production materials is done in Scala. But I get your point, as a matter of fact I realised the toDF method took parameters a little while after posting this. However the toDF still needs you to go from a List to an RDD, or create a useless Dataframe and call toDF on it re-creating a complete data structure. I just feel that the createDataFrame(_: Seq) is not really useful as it is, because I think there are practically no circumstances where you'd want to create a DataFrame without column names. I'm not implying a n-th overloaded method should be created, rather than change the signature of the existing method with an optional Seq of column names. Regards, Olivier. Le dim. 3 mai 2015 à 07:44, Reynold Xin r...@databricks.com a écrit : Part of the reason is that it is really easy to just call toDF on Scala, and we already have a lot of createDataFrame functions. (You might find some of the cross-language differences confusing, but I'd argue most real users just stick to one language, and developers or trainers are the only ones that need to constantly switch between languages). On Sat, May 2, 2015 at 11:05 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, SQLContext.createDataFrame has different behaviour in Scala or Python : l = [('Alice', 1)] sqlContext.createDataFrame(l).collect() [Row(_1=u'Alice', _2=1)] sqlContext.createDataFrame(l, ['name', 'age']).collect() [Row(name=u'Alice', age=1)] and in Scala : scala val data = List((Alice, 1), (Wonderland, 0)) scala sqlContext.createDataFrame(data, List(name, score)) console:28: error: overloaded method value createDataFrame with alternatives: ... cannot be applied to ... What do you think about allowing in Scala too to have a Seq of column names for the sake of consistency ? Regards, Olivier.
createDataFrame allows column names as second param in Python not in Scala
Hi everyone, SQLContext.createDataFrame has different behaviour in Scala or Python : l = [('Alice', 1)] sqlContext.createDataFrame(l).collect() [Row(_1=u'Alice', _2=1)] sqlContext.createDataFrame(l, ['name', 'age']).collect() [Row(name=u'Alice', age=1)] and in Scala : scala val data = List((Alice, 1), (Wonderland, 0)) scala sqlContext.createDataFrame(data, List(name, score)) console:28: error: overloaded method value createDataFrame with alternatives: ... cannot be applied to ... What do you think about allowing in Scala too to have a Seq of column names for the sake of consistency ? Regards, Olivier.
Re: createDataFrame allows column names as second param in Python not in Scala
Part of the reason is that it is really easy to just call toDF on Scala, and we already have a lot of createDataFrame functions. (You might find some of the cross-language differences confusing, but I'd argue most real users just stick to one language, and developers or trainers are the only ones that need to constantly switch between languages). On Sat, May 2, 2015 at 11:05 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, SQLContext.createDataFrame has different behaviour in Scala or Python : l = [('Alice', 1)] sqlContext.createDataFrame(l).collect() [Row(_1=u'Alice', _2=1)] sqlContext.createDataFrame(l, ['name', 'age']).collect() [Row(name=u'Alice', age=1)] and in Scala : scala val data = List((Alice, 1), (Wonderland, 0)) scala sqlContext.createDataFrame(data, List(name, score)) console:28: error: overloaded method value createDataFrame with alternatives: ... cannot be applied to ... What do you think about allowing in Scala too to have a Seq of column names for the sake of consistency ? Regards, Olivier.