Re: Write only one output file in Spark SQL

2017-08-11 Thread Daniel van der Ende
Hi Asmath, Could you share the code you're running? Daniel On Fri, 11 Aug 2017, 17:53 KhajaAsmath Mohammed, wrote: > Hi, > > > > I am using spark sql to write data back to hdfs and it is resulting in > multiple output files. > > > > I tried changing number

Kafka 0.10 with PySpark

2017-07-04 Thread Daniel van der Ende
Hi, I'm working on integrating some pyspark code with Kafka. We'd like to use SSL/TLS, and so want to use Kafka 0.10. Because structured streaming is still marked alpha, we'd like to use Spark streaming. On this page, however, it indicates that the Kafka 0.10 integration in Spark does not support

Re: Spark Job not exited and shows running

2016-11-30 Thread Daniel van der Ende
Hi, I've seen this a few times too. Usually it indicates that your driver doesn't have enough resources to process the result. Sometimes increasing driver memory is enough (yarn memory overhead can also help). Is there any specific reason for you to run in client mode and not in cluster mode?

Re: SPARK-SUBMIT and optional args like -h etc

2016-11-30 Thread Daniel van der Ende
Hi, Looks like the ordering of your parameters to spark submit is different on Windows vs EMR. I assume the -h flag is for an arguments for your python script? In that case you'll need to put the arguments after the python script. Daniel On 1 Dec 2016 6:24 a.m., "Patnaik, Vandana"

Re: Do I have to wrap akka around spark streaming app?

2016-11-28 Thread Daniel van der Ende
Well, I would say it depends on what you're trying to achieve. Right now I don't know why you are considering using Akka. Could you please explain your use case a bit? In general, there is no single correct answer to your current question as it's quite broad. Daniel On Mon, Nov 28, 2016 at 9:11

Re: Unable to lauch Python Web Application on Spark Cluster

2016-11-10 Thread Daniel van der Ende
Hi Anjali, It would help to see the code. But more importantly: why do you want to deploy a web application on a Spark cluster? Spark is meant for distributed, in-memory computations. I don't know what you're application is doing, but it would make more sense to run it outside the Spark cluster,