RE: pickling a udf

2019-04-04 Thread Adaryl Wakefield
Its running in local mode. I’ve ran it in PyCharm and JupyterLab. I’ve restarted the kernel several times. B. From: Abdeali Kothari Sent: Thursday, April 4, 2019 06:35 To: Adaryl Wakefield Cc: user@spark.apache.org Subject: Re: pickling a udf The syntax looks right. Are you still getting

pickling a udf

2019-04-04 Thread Adaryl Wakefield
Are we not supposed to be using udfs anymore? I copied an example straight from a book and I'm getting weird results and I think it's because the book is using a much older version of Spark. The code below is pretty straight forward but I'm getting an error none the less. I've been doing a

SparklyR and the Tidyverse

2017-10-17 Thread Adaryl Wakefield
I'm curious about the inner technical workings of SparklyR. Let's say you have: titanic_train = spark_read_csv(sc, name="titanic_train", path="../Data/titanic_train.csv", header = TRUE, delimiter = ",", quote = "\"", escape = "\\", charset = "UTF-8", null_value = NULL, repartition = 0, memory =

RE: how do you deal with datetime in Spark?

2017-10-03 Thread Adaryl Wakefield
akefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData<http://twitter.com/BobLovesData> From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: Tuesday, October 3, 2017 2:19 PM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com> Cc: user@spark.apache.org Subject:

RE: how do you deal with datetime in Spark?

2017-10-03 Thread Adaryl Wakefield
://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData<http://twitter.com/BobLovesData> From: Nicholas Hakobian [mailto:nicholas.hakob...@rallyhealth.com] Sent: Tuesday, October 3, 2017 1:04 PM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com> Cc: user@spark.apach

how do you deal with datetime in Spark?

2017-10-03 Thread Adaryl Wakefield
I gave myself a project to start actually writing Spark programs. I'm using Scala and Spark 2.2.0. In my project, I had to do some grouping and filtering by dates. It was awful and took forever. I was trying to use dataframes and SQL as much as possible. I see that there are date functions in

RE: using R with Spark

2017-09-25 Thread Adaryl Wakefield
nkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData<http://twitter.com/BobLovesData> From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Sunday, September 24, 2017 6:56 PM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com>; user@spark.apache.org Subject: Re: using R

RE: using R with Spark

2017-09-24 Thread Adaryl Wakefield
a> Twitter: @BobLovesData<http://twitter.com/BobLovesData> From: Georg Heiler [mailto:georg.kf.hei...@gmail.com] Sent: Sunday, September 24, 2017 3:39 PM To: Felix Cheung <felixcheun...@hotmail.com>; Adaryl Wakefield <adaryl.wakefi...@hotmail.com>; user@spark.apache.org S

using R with Spark

2017-09-24 Thread Adaryl Wakefield
There are two packages SparkR and sparklyr. Sparklyr seems to be the more useful. However, do you have to pay to use it? Unless I'm not reading this right, it seems you have to have the paid version of RStudio to use it. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC

Python vs. Scala

2017-09-05 Thread Adaryl Wakefield
Is there any performance difference in writing your application in python vs. scala? I’ve resisted learning Python because it’s an interpreted scripting language, but the market seems to be demanding Python skills. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685

RE: real world spark code

2017-07-25 Thread Adaryl Wakefield
Twitter: @BobLovesData<http://twitter.com/BobLovesData> From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: Tuesday, July 25, 2017 8:31 AM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com> Cc: user@spark.apache.org Subject: Re: real world spark code Look for the ones that have

real world spark code

2017-07-24 Thread Adaryl Wakefield
Anybody know of publicly available GitHub repos of real world Spark applications written in scala? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.massstreet.net

kafka and spark integration

2017-03-22 Thread Adaryl Wakefield
I'm a little confused on how to use Kafka and Spark together. Where exactly does Spark lie in the architecture? Does it sit on the other side of the Kafka producer? Does it feed the consumer? Does it pull from the consumer? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC

RE: finding Spark Master

2017-03-07 Thread Adaryl Wakefield
Principal Mass Street Analytics, LLC 913.938.6685 www.massstreet.net<http://www.massstreet.net> www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData From: Koert Kuipers [mailto:ko...@tresata.com] Sent: Tuesday, March 7, 2017 7:47 P

RE: finding Spark Master

2017-03-07 Thread Adaryl Wakefield
r: @BobLovesData From: ayan guha [mailto:guha.a...@gmail.com] Sent: Tuesday, March 7, 2017 5:59 PM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com>; user@spark.apache.org Subject: Re: finding Spark Master yarn-client or yarn-cluster On Wed, 8 Mar 2017 at 10:28 am, Adaryl Wakefield &

finding Spark Master

2017-03-07 Thread Adaryl Wakefield
I'm running a three node cluster along with Spark along with Hadoop as part of a HDP stack. How do I find my Spark Master? I'm just seeing the clients. I'm trying to figure out what goes in setMaster() aside from local[*]. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC

RE: using spark to load a data warehouse in real time

2017-03-07 Thread Adaryl Wakefield
t;http://about.me/mti> <http://about.me/mti> <http://about.me/mti> <http://about.me/mti> Tariq, Mohammad about.me/mti <http://about.me/mti> <http://about.me/mti> On Wed, Mar 1, 2017 at 12:15 AM, Adaryl Wakefield <adaryl.wakefi...@hotmail.com> wrote:&l

RE: using spark to load a data warehouse in real time

2017-03-04 Thread Adaryl Wakefield
/in/bobwakefieldmba> Twitter: @BobLovesData From: Sam Elamin [mailto:hussam.ela...@gmail.com] Sent: Wednesday, March 1, 2017 2:29 AM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com>; Jörn Franke <jornfra...@gmail.com> Cc: user@spark.apache.org Subject: Re: using spark to load

RE: using spark to load a data warehouse in real time

2017-03-04 Thread Adaryl Wakefield
ass Street Analytics, LLC 913.938.6685 www.massstreet.net<http://www.massstreet.net> www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: Wednesday, March 1, 2017 1:25 AM To: Adaryl Wa

RE: using spark to load a data warehouse in real time

2017-02-28 Thread Adaryl Wakefield
ad Tariq [mailto:donta...@gmail.com] Sent: Tuesday, February 28, 2017 12:57 PM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com> Cc: user@spark.apache.org Subject: Re: using spark to load a data warehouse in real time Hi Adaryl, You could definitely load data into a warehouse through S

RE: using spark to load a data warehouse in real time

2017-02-28 Thread Adaryl Wakefield
ail.com] Sent: Tuesday, February 28, 2017 4:13 AM To: Adaryl Wakefield <adaryl.wakefi...@hotmail.com> Cc: user@spark.apache.org Subject: Re: using spark to load a data warehouse in real time Have you checked to see if there are any drivers to enable you to write to Greenplum directly fro

using spark to load a data warehouse in real time

2017-02-27 Thread Adaryl Wakefield
Is anybody using Spark streaming/SQL to load a relational data warehouse in real time? There isn't a lot of information on this use case out there. When I google real time data warehouse load, nothing I find is up to date. It's all turn of the century stuff and doesn't take into account