Python:Streaming Question

2014-12-21 Thread Samarth Mailinglist
I’m trying to run the stateful network word count at https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/stateful_network_wordcount.py using the command: ./bin/spark-submit examples/src/main/python/streaming/stateful_network_wordcount.py localhost I am also

Re: spark-submit question

2014-11-17 Thread Samarth Mailinglist
4:59 AM, Samarth Mailinglist mailinglistsama...@gmail.com wrote: I am trying to run a job written in python with the following command: bin/spark-submit --master spark://localhost:7077 /path/spark_solution_basic.py --py-files /path/*.py --files /path/config.properties I always get

Probability in Naive Bayes

2014-11-17 Thread Samarth Mailinglist
I am trying to use Naive Bayes for a project of mine in Python and I want to obtain the probability value after having built the model. Suppose I have two classes - A and B. Currently there is an API to to find which class a sample belongs to (predict). Now, I want to find the probability of it

Re: Functions in Spark

2014-11-16 Thread Samarth Mailinglist
Check this video out: https://www.youtube.com/watch?v=dmL0N3qfSc8list=UURzsq7k4-kT-h3TDUBQ82-w On Mon, Nov 17, 2014 at 9:43 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, Is there any way to know which of my functions perform better in Spark? In other words, say I have achieved same

Re: Scala vs Python performance differences

2014-11-12 Thread Samarth Mailinglist
I was about to ask this question. On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash and...@andrewash.com wrote: Jeremy, Did you complete this benchmark in a way that's shareable with those interested here? Andrew On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas nicholas.cham...@gmail.com

Re: Read a HDFS file from Spark source code

2014-11-11 Thread Samarth Mailinglist
Instead of a file path, use a HDFS URI. For example: (In Python) data = sc.textFile(hdfs://localhost/user/someuser/data) ​ On Wed, Nov 12, 2014 at 10:12 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi I am trying to access a file in HDFS from spark source code. Basically, I am

Re: Using mongo with PySpark

2014-06-04 Thread Samarth Mailinglist
, Samarth Mailinglist mailinglistsama...@gmail.com wrote: db = MongoClient()['spark_test_db'] *collec = db['programs']* def mapper(val): asc = val.encode('ascii','ignore') json = convertToJSON(asc, indexMap) collec.insert(json) # *this is not working* def convertToJSON(string