Re: parsing embedded json in spark

2016-12-23 Thread Tal Grynbaum
re you using spark standalone, yarn, or mesos? >> >> >> Thank You, >> >> Irving Duran >> >> On Thu, Dec 22, 2016 at 1:42 AM, Tal Grynbaum <tal.grynb...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I have a dataframe that

parsing embedded json in spark

2016-12-21 Thread Tal Grynbaum
Hi, I have a dataframe that contain an embedded json string in one of the fields I'd tried to write a UDF function that will parse it using lift-json, but it seems to take a very long time to process, and it seems that only the master node is working. Has anyone dealt with such a scenario before

Re: difference between package and jar Option in Spark

2016-09-04 Thread Tal Grynbaum
You need to download all the dependencies of that jar as well On Mon, Sep 5, 2016, 06:59 Divya Gehlot wrote: > Hi, > I am using spark-csv to parse my input files . > If I use --package option it works fine but if I download >

Re: any idea what this error could be?

2016-09-03 Thread Tal Grynbaum
My guess is that you're running out of memory somewhere. Try to increase the driver memory and/or executor memory. On Sat, Sep 3, 2016, 11:42 kant kodali wrote: > I am running this on aws. > > > > On Fri, Sep 2, 2016 11:49 PM, kant kodali kanth...@gmail.com wrote: > >> I am

Re: Scala Vs Python

2016-09-02 Thread Tal Grynbaum
ther words, even if spark was rewritten in python, and was to focus on python only, you would still not get those features. -- *Tal Grynbaum* / *CTO & co-founder* m# +972-54-7875797 mobile retention done right

Suggestions for calculating MAU/WAU/DAU

2016-08-28 Thread Tal Grynbaum
if one of you can think of a better solution. Thanks Tal -- *Tal Grynbaum* / *CTO & co-founder* m# +972-54-7875797 mobile retention done right

Re: Please assist: Building Docker image containing spark 2.0

2016-08-26 Thread Tal Grynbaum
Did you specify -Dscala-2.10 As in ./dev/change-scala-version.sh 2.10 ./build/mvn -Pyarn -Phadoop-2.4 -Dscala-2.10 -DskipTests clean package If you're building with scala 2.10 On Sat, Aug 27, 2016, 00:18 Marco Mistroni wrote: > Hello Michael > uhm i celebrated too soon

Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?

2016-08-25 Thread Tal Grynbaum
dir when jobs fail. > In the previous versions of Spark, there was a way to directly write data > in a destination though, > Spark v2.0+ has no way to do that because of the critial issue on S3 (See: > SPARK-10063). > > // maropu > > > On Thu, Aug 25, 2016 at 2:40 PM, Tal Grynbaum

Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?

2016-08-24 Thread Tal Grynbaum
I read somewhere that its because s3 has to know the size of the file upfront I dont really understand this, as to why is it ok not to know it for the temp files and not ok for the final files. The delete permission is the minor disadvantage from my side, the worst thing is that i have a

Re: how to select first 50 value of each group after group by?

2016-07-06 Thread Tal Grynbaum
You can use rank window function to rank each row in the group, and then filter the rowz with rank < 50 On Wed, Jul 6, 2016, 14:07 wrote: > hi there > I have a DF with 3 columns: id , pv, location.(the rows are already > grouped by location and sort by pv in des) I wanna