Re: Spark Job not exited and shows running

2016-11-30 Thread ayan guha
Can you add sc.stop at the end of the code and try? On 1 Dec 2016 18:03, "Daniel van der Ende" wrote: > Hi, > > I've seen this a few times too. Usually it indicates that your driver > doesn't have enough resources to process the result. Sometimes increasing > driver

Re: Spark Job not exited and shows running

2016-11-30 Thread Daniel van der Ende
Hi, I've seen this a few times too. Usually it indicates that your driver doesn't have enough resources to process the result. Sometimes increasing driver memory is enough (yarn memory overhead can also help). Is there any specific reason for you to run in client mode and not in cluster mode?

Re: Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

2016-11-30 Thread Deepak Sharma
In Spark > 2.0 , spark session was introduced that you can use to query hive as well. Just make sure you create spark session with enableHiveSupport() option. Thanks Deepak On Thu, Dec 1, 2016 at 12:27 PM, shyla deshpande wrote: > I am Spark 2.0.2 , using DStreams

Re: SPARK-SUBMIT and optional args like -h etc

2016-11-30 Thread Daniel van der Ende
Hi, Looks like the ordering of your parameters to spark submit is different on Windows vs EMR. I assume the -h flag is for an arguments for your python script? In that case you'll need to put the arguments after the python script. Daniel On 1 Dec 2016 6:24 a.m., "Patnaik, Vandana"

Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

2016-11-30 Thread shyla deshpande
I am Spark 2.0.2 , using DStreams because I need Cassandra Sink. How do I create SQLContext? I get the error SQLContext deprecated. *[image: Inline image 1]* *Thanks*

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
Here is another transformation that might cause the error but it has to be one of these two since I only have two transformations jsonMessagesDStream .window(new Duration(6), new Duration(1000)) .mapToPair(new PairFunction() { @Override

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
Hi Marco, Here is what my code looks like Config config = new Config("hello"); SparkConf sparkConf = config.buildSparkConfig(); sparkConf.setJars(JavaSparkContext.jarOfClass(Driver.class)); JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new

Re: updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread shyla deshpande
Thanks Miguel for the response. Works great. I am having a tuple for my key and the values are String and returning String to the updateStateByKey. On Wed, Nov 30, 2016 at 12:33 PM, Miguel Morales wrote: > I *think* you can return a map to updateStateByKey which would

Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne
8080 is just the normal web UI. Which is the information I want, ie Running Applications, but in HTML format. I want it in JSON so I don't have to be scraping and parsing HTML. From my understanding api/v1/applications should do the trick ... except it doesn't. Ah well. On 1/12/2016 4:00

Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Miguel Morales
Don't have a Spark cluster up to verify this, but try port 8080. http://spark-master-ip:8080/api/v1/applications. But glad to hear you're getting somewhere, best of luck. On Wed, Nov 30, 2016 at 9:59 PM, Carl Ballantyne wrote: > Hmmm getting closer I think. > > I

Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne
Hmmm getting closer I think. I thought this was only for Mesos and Yarn clusters (from reading the documentation). I tried anyway and initially received Connection Refused. So I ran ./start-history-server.sh. This was on the Spark Master instance. I now get 404 not found. Nothing in the

Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Miguel Morales
Try hitting: http://:18080/api/v1 Then hit /applications. That should give you a list of running spark jobs on a given server. On Wed, Nov 30, 2016 at 9:30 PM, Carl Ballantyne wrote: > > Yes I was looking at this. But it says I need to access the driver - >

Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne
Yes I was looking at this. But it says I need to access the driver - |http://:4040.| I don't have a running driver Spark instance since I am submitting jobs to Spark using the SparkLauncher class. Or maybe I am missing something obvious. Apologies if so. On 1/12/2016 3:21 PM, Miguel

SPARK-SUBMIT and optional args like -h etc

2016-11-30 Thread Patnaik, Vandana
Hello All, I am new to spark and am wondering how to pass an optional argument to my python program using SPARK-SUBMIT. This works fine on my local machine but not on AWS EMR: On Windows: C:\Vandana\spark\examples>..\bin\spark-submit new_profile_csv1.py -h 0 -t exam ple_float.txt On EMR:

Re: Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Miguel Morales
Check the Monitoring and Instrumentation API: http://spark.apache.org/docs/latest/monitoring.html On Wed, Nov 30, 2016 at 9:20 PM, Carl Ballantyne wrote: > Hi All, > > I want to get the running applications for my Spark Standalone cluster in > JSON format. The same

Spark Standalone Cluster - Running applications in JSON format

2016-11-30 Thread Carl Ballantyne
Hi All, I want to get the running applications for my Spark Standalone cluster in JSON format. The same information displayed on the web UI on port 8080 ... but in JSON. Is there an easy way to do this? It seems I need to scrap the HTML page in order to get this information. The reason I

Re: SVM regression in Spark

2016-11-30 Thread roni
Hi Spark expert, Can anyone help for doing SVR (Support vector machine regression) in SPARK. Thanks R On Tue, Nov 29, 2016 at 6:50 PM, roni wrote: > Hi All, > I am trying to change my R code to spark. I am using SVM regression in R > . It seems like spark is providing

Re: PySpark to remote cluster

2016-11-30 Thread Felix Cheung
Spark 2.0.1 is running with a different py4j library than Spark 1.6. You will probably run into other problems mixing versions though - is there a reason you can't run Spark 1.6 on the client? _ From: Klaus Schaefers

Unsubscribe

2016-11-30 Thread Sivakumar S

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread Marco Mistroni
Could you paste reproducible snippet code? Kr On 30 Nov 2016 9:08 pm, "kant kodali" wrote: > I have lot of these exceptions happening > > java.lang.Exception: Could not compute split, block input-0-1480539568000 > not found > > > Any ideas what this could be? >

java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
I have lot of these exceptions happening java.lang.Exception: Could not compute split, block input-0-1480539568000 not found Any ideas what this could be?

Re: updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread Miguel Morales
I *think* you can return a map to updateStateByKey which would include your fields. Another approach would be to create a hash (like create a json version of the hash and return that.) On Wed, Nov 30, 2016 at 12:30 PM, shyla deshpande wrote: > updateStateByKey - Can

updateStateByKey -- when the key is multi-column (like a composite key )

2016-11-30 Thread shyla deshpande
updateStateByKey - Can this be used when the key is multi-column (like a composite key ) and the value is not numeric. All the examples I have come across is where the key is a simple String and the Value is numeric. Appreciate any help. Thanks

Save the date: ApacheCon Miami, May 15-19, 2017

2016-11-30 Thread Rich Bowen
Dear Apache enthusiast, ApacheCon and Apache Big Data will be held at the Intercontinental in Miami, Florida, May 16-18, 2017. Submit your talks, and register, at http://apachecon.com/ Talks aimed at the Big Data section of the event should go to

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Reynold Xin
This should fix it: https://github.com/apache/spark/pull/16080 On Wed, Nov 30, 2016 at 10:55 AM, Timur Shenkao wrote: > Hello, > > Yes, I used hiveContext, sqlContext, sparkSession from Java, Scala, > Python. > Via spark-shell, spark-submit, IDE (PyCharm, Intellij IDEA). >

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Timur Shenkao
Hello, Yes, I used hiveContext, sqlContext, sparkSession from Java, Scala, Python. Via spark-shell, spark-submit, IDE (PyCharm, Intellij IDEA). Everything is perfect because I have Hadoop cluster with configured & tuned HIVE. The reason of Michael's error is usually misconfigured or absent HIVE.

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Gourav Sengupta
Hi Timur, did you use hiveContext or sqlContext or the spark way mentioned in the http://spark.apache.org/docs/latest/sql-programming-guide.html? Regards, Gourav Sengupta On Wed, Nov 30, 2016 at 5:35 PM, Yin Huai wrote: > Hello Michael, > > Thank you for reporting this

SPARK 2.0 CSV exports (https://issues.apache.org/jira/browse/SPARK-16893)

2016-11-30 Thread Gourav Sengupta
Hi Sean, I think that the main issue was users importing the package while starting SPARK just like the way we used to do in SPARK 1.6. After removing that option from --package while starting SPARK 2.0 the issue of conflicting libraries disappeared. I have written about this in

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Yin Huai
Hello Michael, Thank you for reporting this issue. It will be fixed by https://github.com/apache/spark/pull/16080. Thanks, Yin On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao wrote: > Hi! > > Do you have real HIVE installation? > Have you built Spark 2.1 & Spark 2.0 with

Parallel dynamic partitioning producing duplicated data

2016-11-30 Thread Mehdi Ben Haj Abbes
Hi Folks, I have a spark job reading a csv file into a dataframe. I register that dataframe as a tempTable then I’m writing that dataframe/tempTable to hive external table (using parquet format for storage) I’m using this kind of command : hiveContext.sql(*"INSERT INTO TABLE t

PySpark to remote cluster

2016-11-30 Thread Klaus Schaefers
Hi, I want to connect with a local Jupyter Notebook to a remote Spark cluster. The Cluster is running Spark 2.0.1 and the Jupyter notebook is based on Spark 1.6 and running in a docker image (Link). I try to init the SparkContext like this: import pyspark sc =

Can I have two different receivers for my Spark client program?

2016-11-30 Thread kant kodali
HI All, I am wondering if it makes sense to have two receivers inside my Spark Client program? The use case is as follows. 1) We have to support a feed from Kafka so this will be a direct receiver #1. We need to perform batch inserts from kafka feed to Cassandra. 2) an gRPC receiver where we

Unsubscribe

2016-11-30 Thread Aditya
Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Logistic regression using gradient ascent

2016-11-30 Thread Meeraj Kunnumpurath
Hello, I have been trying to implement logistic regression using gradient ascent, out of curiosity. I am using Spark ML feature extraction packages and data frames, and not any of the implemented algorithms. I will be grateful if any of you could please cast an eye and provide some feedback.