unsubscribe

2018-03-27 Thread Nicholas Sharkey

Re: [PySpark SQL] sql function to_date and to_timestamp return the same data type

2018-03-15 Thread Nicholas Sharkey
unsubscribe On Thu, Mar 15, 2018 at 8:00 PM, Alan Featherston Lago wrote: > I'm a pretty new user of spark and I've run into this issue with the > pyspark docs: > > The functions pyspark.sql.functions.to_date && > pyspark.sql.functions.to_timestamp > behave in the same way.

Re: H2O DataFrame to Spark RDD/DataFrame

2017-01-12 Thread Nicholas Sharkey
Page 33 of the Sparkling Water Booklet: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/SparklingWaterBooklet.pdf df = sqlContext.read.format("h2o").option("key",frame.frame_id).load() df = sqlContext.read.format("h2o").load(frame.frame_id) On Thu, Jan 12, 2017 at 1:17 PM, Md. Rezaul

Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread Nicholas Sharkey
Amen > On Nov 13, 2016, at 7:55 PM, janardhan shetty wrote: > > These Jiras' are still unresolved: > https://issues.apache.org/jira/browse/SPARK-11215 > > Also there is https://issues.apache.org/jira/browse/SPARK-8418 > >> On Wed, Aug 17, 2016 at 11:15 AM, Nisha

Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nicholas Sharkey
I did get *some* help from DataBricks in terms of programmatically grabbing the categorical variables but I can't figure out where to go from here: *# Get all string cols/categorical cols* *stringColList = [i[0] for i in df.dtypes if i[1] == 'string']* *# generate OHEs for every col in

Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nicholas Sharkey
I have a dataset that I need to convert some of the the variables to dummy variables. The get_dummies function in Pandas works perfectly on smaller datasets but since it collects I'll always be bottlenecked by the master node. I've looked at Spark's OHE feature and while that will work in theory