pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani
Hi, I used to run spark scripts on local machine. Now i am porting my codes to EMR and i am facing lots of problem. The main one now is that the spark script which is running properly on my local machine is giving error when run on Amazon EMR Cluster. Here is the error: [image: Inline image 1]

Re: pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani
) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) ... 64 more On Sun, Aug 3, 2014 at 6:04 PM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Hi, I

Re: Can we get a spark context inside a mapper

2014-07-15 Thread Rahul Bhojwani
with SparkContext.wholeTextFiles and call Weka on each one. Matei On Jul 14, 2014, at 11:30 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: I understand that the question is very unprofessional, but I am a newbie. If you could share some link where I can ask such questions, if not here. But please

Error in spark: Exception in thread delete Spark temp dir

2014-07-14 Thread Rahul Bhojwani
I am getting an error saying: Exception in thread delete Spark temp dir C:\Users\shawn\AppData\Local\Temp\spark-b4f1105c-d67b-488c-83f9-eff1d1b95786 java.io.IOExcept ion: Failed to delete: C:\Users\shawn\AppData\Local\Temp\spark-b4f1105c-d67b-488c-83f9-eff1d1b95786\tmppr36zu at

Can we get a spark context inside a mapper

2014-07-14 Thread Rahul Bhojwani
Hey, My question is for this situation: Suppose we have 10 files each containing list of features in each row. Task is that for each file cluster the features in that file and write the corresponding cluster along with it in a new file. So we have to generate 10 more files by applying

Re: Can we get a spark context inside a mapper

2014-07-14 Thread Rahul Bhojwani
I understand that the question is very unprofessional, but I am a newbie. If you could share some link where I can ask such questions, if not here. But please answer. On Mon, Jul 14, 2014 at 6:52 PM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Hey, My question is for this situation

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
there is a smoothing parameter, and yes from the looks of it it is simply additive / Laplace smoothing. It's been in there for a while. On Thu, Jul 10, 2014 at 6:55 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
(Bug according to me.) I m not trying to be selfish. Its just that if I get something that can help make my profile look strong then I shouldn't miss it at this stage. Thanks, On Thu, Jul 10, 2014 at 5:54 PM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Ya thanks. I can see that lambda

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
/jira/browse/SPARK/ Bertrand On Thu, Jul 10, 2014 at 2:37 PM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: And also that there is a small bug in implementation. As I mentioned this earlier also. This is my first time I am reporting some bug. So I just wanted to ask, that do your name

Error using MLlib-NaiveBayes : Matrices are not aligned

2014-07-09 Thread Rahul Bhojwani
I am using Naive Bayes in MLlib . Below I have printed log of *model.theta*. after training on train data. You can check that it contains 9 features for 2 class classification. print numpy.log(model.theta) [[ 0.31618962 0.16636852 0.07200358 0.05411449 0.08542039 0.17620751 0.03711986

Spark 0.9.1 implementation of MLlib-NaiveBayes is having bug.

2014-07-09 Thread Rahul Bhojwani
According to me there is BUG in MLlib Naive Bayes implementation in spark 0.9.1. Whom should I report this to or with whom should I discuss? I can discuss this over call as well. My Skype ID : rahul.bhijwani Phone no: +91-9945197359 Thanks, -- Rahul K Bhojwani 3rd Year B.Tech Computer

Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-09 Thread Rahul Bhojwani
The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes implementation incorporates Laplase smoothing? Or any other smoothing? Or it doesn't encorporates any smoothing?? Please inform? Thanks, -- Rahul K Bhojwani 3rd Year B.Tech Computer Science and Engineering National Institute of

Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
Hello, I am a novice.I want to classify the text into two classes. For this purpose I want to use Naive Bayes model. I am using Python for it. Here are the problems I am facing: *Problem 1:* I wanted to use all words as features for the bag of words model. Which means my features will be count

Is MLlib NaiveBayes implementation for Spark 0.9.1 correct?

2014-07-08 Thread Rahul Bhojwani
Hi, I wanted to use Naive Bayes for a text classification problem.I am using Spark 0.9.1. I was just curious to ask that is the Naive Bayes implementation in Spark 0.9.1 correct? Or are there any bugs in the Spark 0.9.1 implementation which are taken care in Spark 1.0. My question is specific

How to incorporate the new data in the MLlib-NaiveBayes model along with predicting?

2014-07-08 Thread Rahul Bhojwani
Hi, I am using the MLlib Naive Bayes for a text classification problem. I have very less amount of training data. And then the data will be coming continuously and I need to classify it as either A or B. I am training the MLlib Naive Bayes model using the training data but next time when data

Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
HI, I am getting this error. Can anyone help out to explain why is this error coming. Exception in thread delete Spark temp dir C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560 java.io.IOException: Failed to delete:

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
of your executor being killed. For example, Yarn will do that if you're going over the requested memory limits. On Tue, Jul 8, 2014 at 12:17 PM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: HI, I am getting this error. Can anyone help out to explain why is this error coming

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
(train_data))) file_predicted.write(msg + ## + sentiment + \n) file_predicted.close() ### If you can have a look at the code and help me out, It would be great Thanks On Wed, Jul 9, 2014 at 12:54 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Hi Marcelo. Thanks

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
) These are the logs. Can you suggest something after looking at it. On Wed, Jul 9, 2014 at 1:10 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Here I am adding my code. If you can have a look to help me out. Thanks ### import

Re: How to incorporate the new data in the MLlib-NaiveBayes model along with predicting?

2014-07-08 Thread Rahul Bhojwani
to update the priors and conditional probabilities, which means we should also remember the number of observations for the updates. Best, Xiangrui On Tue, Jul 8, 2014 at 7:35 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Hi, I am using the MLlib Naive Bayes for a text classification

Re: Is MLlib NaiveBayes implementation for Spark 0.9.1 correct?

2014-07-08 Thread Rahul Bhojwani
the case for text classificiation. I would recommend upgrading to v1.0. -Xiangrui On Tue, Jul 8, 2014 at 7:20 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Hi, I wanted to use Naive Bayes for a text classification problem.I am using Spark 0.9.1. I was just curious to ask

Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
need summation. Best, Xiangrui On Tue, Jul 8, 2014 at 12:01 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: I am really sorry. Its actually my mistake. My problem 2 is wrong because using a single feature is a senseless thing. Sorry for the inconvenience. But still I will be waiting

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
program, so Spark can clean up after itself? On Tue, Jul 8, 2014 at 12:40 PM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Here I am adding my code. If you can have a look to help me out. Thanks ### import tokenizer import gettingWordLists as gl from

Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Hi, I recently posted a question on stackoverflow but didn't get any reply. I joined the mailing list now. Can anyone of you guide me a way for the problem mentioned in http://stackoverflow.com/questions/23923966/writing-the-rdd-data-in-excel-file-along-mapping-in-apache-spark Thanks in advance

Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks Marcelo, It actually made my few concepts clear. (y). On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello there, On Fri, May 30, 2014 at 9:36 AM, Marcelo Vanzin van...@cloudera.com wrote: workbook = xlsxwriter.Workbook('output_excel.xlsx') worksheet

Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks jey I was hellpful. On Sat, May 31, 2014 at 12:45 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Thanks Marcelo, It actually made my few concepts clear. (y). On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello there, On Fri, May 30, 2014