Re: help in copying data from one azure subscription to another azure subscription

2018-05-23 Thread Pushkar.Gujar
What are you using for storing data in those subscriptions? Datalake or Blobs? There is Azure Data Factory already available that can do copy between these cloud storage without having to go through spark Thank you, *Pushkar Gujar* On Mon, May 21, 2018 at 8:59 AM, amit kumar singh

Re: Multiple CSV libs causes issues spark 2.1

2017-05-09 Thread Pushkar.Gujar
> > df = spark.sqlContext.read.csv('out/df_in.csv') > shouldn't this be just - df = spark.read.csv('out/df_in.csv') sparkSession itself is in entry point to dataframes and SQL functionality . Thank you, *Pushkar Gujar* On Tue, May 9, 2017 at 6:09 PM, Mark Hamstra

Re: Spark books

2017-05-03 Thread Pushkar.Gujar
*"I would suggest do not buy any book, just start with databricks community edition"* I dont agree with above , "Learning Spark" book was definitely stepping stone for me. All the basics that one beginner can/will need is covered in very easy to understand format with examples. Great book!

Re: how to find the nearest holiday

2017-04-25 Thread Pushkar.Gujar
​You can use ​- start_date_test2.holiday.getItem[0] ​I would highly suggest you to look at latest documentation - http://spark.apache.org/docs/latest/api/python/index.html ​ Thank you, *Pushkar Gujar* On Tue, Apr 25, 2017 at 8:50 AM, Zeming Yu wrote: > How could I access

Re: udf that handles null values

2017-04-24 Thread Pushkar.Gujar
Someone had similar issue today at stackoverflow. http://stackoverflow.com/questions/43595201/python-how-to-convert-pyspark-column-to-date-type-if-there-are-null-values/43595728#43595728 Thank you, *Pushkar Gujar* On Mon, Apr 24, 2017 at 8:22 PM, Zeming Yu wrote: > hi

Re: question regarding pyspark

2017-04-21 Thread Pushkar.Gujar
Hi Afshin, If you need to associate header information from 2nd file to first one i.e. , you can do that with specifying custom schema. Below is example from spark-csv package. As you can guess, you will have to do some pre-processing to create customSchema by first reading second file . val

Re: how to add new column using regular expression within pyspark dataframe

2017-04-20 Thread Pushkar.Gujar
Can be as simple as - from pyspark.sql.functions import split flight.withColumn('hour',split(flight.duration,'h').getItem(0)) Thank you, *Pushkar Gujar* On Thu, Apr 20, 2017 at 4:35 AM, Zeming Yu wrote: > Any examples? > > On 20 Apr. 2017 3:44 pm, "颜发才(Yan Facai)"

Re: Optimisation Tips

2017-04-12 Thread Pushkar.Gujar
Not a expert, but groupByKey operation is well known to cause lot of shuffling and usually operation performed by groupbykey operation can be replaced by reducebykey. Here is great article on groupByKey operation -

Re: How to run a spark on Pycharm

2017-03-03 Thread Pushkar.Gujar
u please tell me a bit more in details how to do that? > I installed ipython and Jupyter notebook on my local machine. But how can > I run the code using them? Before, I tried to run the code with Pycharm > that I was failed. > > Thanks, > Anahita > > On Fri, Mar 3, 2017 at 3:48

Re: How to run a spark on Pycharm

2017-03-03 Thread Pushkar.Gujar
Jupyter notebook/ipython can be connected to apache spark Thank you, *Pushkar Gujar* On Fri, Mar 3, 2017 at 9:43 AM, Anahita Talebi wrote: > Hi everyone, > > I am trying to run a spark code on Pycharm. I tried to give the path of > spark as a environment variable