How is the predict() working in LogisticRegressionModel?

2015-11-13 Thread MEETHU MATHEW
Hi all,Can somebody point me to the implementation of predict() in LogisticRegressionModel of spark mllib? I could find a predictPoint() in the class LogisticRegressionModel, but where is predict()?  Thanks & Regards,  Meethu M

Re: Please reply if you use Mesos fine grained mode

2015-11-03 Thread MEETHU MATHEW
Hi, We are using Mesos fine grained mode because we can have multiple instances of spark to share machines and each application get resources dynamically allocated.  Thanks & Regards,  Meethu M On Wednesday, 4 November 2015 5:24 AM, Reynold Xin wrote: If you

Re: Best way to merge final output part files created by Spark job

2015-09-17 Thread MEETHU MATHEW
Try coalesce(1) before writing Thanks & Regards, Meethu M On Tuesday, 15 September 2015 6:49 AM, java8964 wrote: #yiv1620377612 #yiv1620377612 --.yiv1620377612hmmessage P{margin:0px;padding:0px;}#yiv1620377612

Re: make-distribution.sh failing at spark/R/lib/sparkr.zip

2015-08-13 Thread MEETHU MATHEW
On Wed, Aug 12, 2015 at 3:08 PM, Burak Yavuz brk...@gmail.com wrote: Are you running from master? Could you delete line 222 of make-distribution.sh?We updated when we build sparkr.zip. I'll submit a fix for it for 1.5 and master. Burak On Wed, Aug 12, 2015 at 3:31 AM, MEETHU MATHEW meethu2

Re: Combining Spark Files with saveAsTextFile

2015-08-06 Thread MEETHU MATHEW
Hi,Try using coalesce(1) before calling saveAsTextFile() Thanks Regards, Meethu M On Wednesday, 5 August 2015 7:53 AM, Brandon White bwwintheho...@gmail.com wrote: What is the best way to make saveAsTextFile save as only a single file?

RE:Building scaladoc using build/sbt unidoc failure

2015-07-10 Thread MEETHU MATHEW
Hi, I am getting the assertion error while trying to run build/sbt unidoc same as you described in Building scaladoc using build/sbt unidoc failure .Could you tell me how you get it working ? |   | |   |   |   |   |   | | Building scaladoc using build/sbt unidoc failureHello,I am trying to build

Re: How to create fewer output files for Spark job ?

2015-06-04 Thread MEETHU MATHEW
Try using coalesce Thanks Regards, Meethu M On Wednesday, 3 June 2015 11:26 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I am running a series of spark functions with 9000 executors and its resulting in 9000+ files that is execeeding the namespace file count qutota. How can Spark

Re: How to run multiple jobs in one sparkcontext from separate threads in pyspark?

2015-05-20 Thread MEETHU MATHEW
operation in multiple threads within a function or you want run multiple jobs using multiple threads? I am wondering why python thread module can't be used? Or you have already gave it a try? On 18 May 2015 16:39, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi Akhil, The python wrapper

Re: How to run multiple jobs in one sparkcontext from separate threads in pyspark?

2015-05-18 Thread MEETHU MATHEW
...@sigmoidanalytics.com wrote: Did you happened to have a look at the spark job server? Someone wrote a python wrapper around it, give it a try. ThanksBest Regards On Thu, May 14, 2015 at 11:10 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi all,  Quote Inside a given Spark application (SparkContext instance

Re: Restricting the number of iterations in Mllib Kmeans

2015-05-18 Thread MEETHU MATHEW
Hi,I think you cant supply an initial set of centroids to kmeans Thanks Regards, Meethu M On Friday, 15 May 2015 12:37 AM, Suman Somasundar suman.somasun...@oracle.com wrote: !--#yiv5602900621 _filtered #yiv5602900621 {font-family:Cambria Math;panose-1:2 4 5 3 5 4 6 3 2 4;}

How to run multiple jobs in one sparkcontext from separate threads in pyspark?

2015-05-13 Thread MEETHU MATHEW
Hi all,  Quote Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads.   How to run multiple jobs in one SPARKCONTEXT using separate threads in pyspark? I found some examples in scala and java, but

Spark-1.3.0 UI shows 0 cores in completed applications tab

2015-03-26 Thread MEETHU MATHEW
Hi all, I started spark-shell in spark-1.3.0 and did some actions. The UI was showing 8 cores under the running applications tab. But when I exited the spark-shell using exit, the application is moved to completed applications tab and the number of cores is 0. Again when I exited the

How to build Spark and run examples using Intellij ?

2015-03-09 Thread MEETHU MATHEW
Hi, I am trying to  run examples of spark(master branch from git)  from Intellij(14.0.2) but facing errors. These are the steps I followed: 1. git clone the master branch of apache spark.2. Build it using mvn -DskipTests clean install3. In Intellij  select Import Projects and choose the POM.xml

How to read from hdfs using spark-shell in Intel hadoop?

2015-02-26 Thread MEETHU MATHEW
Hi, I am not able to read from HDFS(Intel distribution hadoop,Hadoop version is 1.0.3) from spark-shell(spark version is 1.2.1). I built spark using the commandmvn -Dhadoop.version=1.0.3 clean package and started  spark-shell and read a HDFS file using sc.textFile() and the exception is    WARN

Re: Mllib Error

2014-12-11 Thread MEETHU MATHEW
Hi,Try this.Change spark-mllib to spark-mllib_2.10 libraryDependencies ++=Seq( org.apache.spark % spark-core_2.10 % 1.1.1  org.apache.spark % spark-mllib_2.10 % 1.1.1 )  Thanks Regards, Meethu M On Friday, 12 December 2014 12:22 PM, amin mohebbi aminn_...@yahoo.com.INVALID wrote:  

Re: How to incrementally compile spark examples using mvn

2014-12-04 Thread MEETHU MATHEW
PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi,I have a similar problem.I modified the code in mllib and examples.I did  mvn install -pl mllib mvn install -pl examples But when I run the program in examples using run-example,the older version of   mllib (before the changes were made

Re: How to incrementally compile spark examples using mvn

2014-11-28 Thread MEETHU MATHEW
Hi,I have a similar problem.I modified the code in mllib and examples.I did mvn install -pl mllib mvn install -pl examples But when I run the program in examples using run-example,the older version of   mllib (before the changes were made) is getting executed.How to get the changes made in mllib

Re: ISpark class not found

2014-11-11 Thread MEETHU MATHEW
Hi, I was also trying Ispark..But I couldnt even start the notebook..I am getting the following error. ERROR:tornado.access:500 POST /api/sessions (127.0.0.1) 10.15ms referer=http://localhost:/notebooks/Scala/Untitled0.ipynb How did you start the notebook?  Thanks Regards, Meethu M

Is there a step-by-step instruction on how to build Spark App with IntelliJ IDEA?

2014-11-10 Thread MEETHU MATHEW
Hi, This question was asked  earlier  and I did it in the way specified..I am getting java.lang.ClassNotFoundException.. Can somebody explain all the steps required to build a spark app using IntelliJ (latest version)starting from creating the project to running it..I searched a lot but couldnt

Re: Relation between worker memory and executor memory in standalone mode

2014-10-07 Thread MEETHU MATHEW
Try to set --total-executor-cores to limit how many total cores it can use. Thanks Regards, Meethu M On Thursday, 2 October 2014 2:39 AM, Akshat Aranya aara...@gmail.com wrote: I guess one way to do so would be to run 1 worker per node, like say, instead of running 1 worker and giving

Same code --works in spark 1.0.2-- but not in spark 1.1.0

2014-10-07 Thread MEETHU MATHEW
Hi all, My code was working fine in spark 1.0.2 ,but after upgrading to 1.1.0, its throwing exceptions and tasks are getting failed. The code contains some map and filter transformations followed by groupByKey (reduceByKey in another code ). What I could find out is that the code works fine

Python version of kmeans

2014-09-17 Thread MEETHU MATHEW
Hi all, I need the kmeans code written against Pyspark for some testing purpose. Can somebody tell me the difference between these two files. spark-1.0.1/examples/src/main/python/kmeans.py and spark-1.0.1/python/pyspark/mllib/clustering.py Thanks Regards, Meethu M

Re: how to specify columns in groupby

2014-08-29 Thread MEETHU MATHEW
d2 = d1.reduceByKey(_+_) d2.foreach(println) 2014-08-28 20:04 GMT+08:00 MEETHU MATHEW meethu2...@yahoo.co.in: Hi all, I have an RDD which has values in the format id,date,cost. I want to group the elements based on the id and date columns and get the sum of the cost for each group

how to specify columns in groupby

2014-08-28 Thread MEETHU MATHEW
Hi all, I have an RDD  which has values in the  format id,date,cost. I want to group the elements based on the id and date columns and get the sum of the cost  for each group. Can somebody tell me how to do this?   Thanks Regards, Meethu M

Re: Losing Executors on cluster with RDDs of 100GB

2014-08-26 Thread MEETHU MATHEW
Hi, Plz give a try by changing the worker memory such that worker memoryexecutor memory   Thanks Regards, Meethu M On Friday, 22 August 2014 5:18 PM, Yadid Ayzenberg ya...@media.mit.edu wrote: Hi all, I have a spark cluster of 30 machines, 16GB / 8 cores on each running in standalone

Re: OutOfMemory Error

2014-08-20 Thread MEETHU MATHEW
 Hi , How to increase the heap size? What is the difference between spark executor memory and heap size? Thanks Regards, Meethu M On Monday, 18 August 2014 12:35 PM, Akhil Das ak...@sigmoidanalytics.com wrote: I believe spark.shuffle.memoryFraction is the one you are looking for.

Use of SPARK_DAEMON_JAVA_OPTS

2014-07-23 Thread MEETHU MATHEW
 Hi all, Sorry for taking this topic again,still I am confused on this. I set SPARK_DAEMON_JAVA_OPTS=-XX:+UseCompressedOops -Xmx8g              when I run my application,I  got the following line in logs. Spark Command: java -cp

Re: Error with spark-submit (formatting corrected)

2014-07-18 Thread MEETHU MATHEW
Hi, Instead of spark://10.1.3.7:7077 use spark://vmsparkwin1:7077  try this $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://vmsparkwin1:7077 --executor-memory 1G --total-executor-cores 2 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 10   Thanks Regards, Meethu M

Re: Pysparkshell are not listing in the web UI while running

2014-07-17 Thread MEETHU MATHEW
17, 2014 at 1:35 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:  Hi all, I just upgraded to spark 1.0.1. In spark 1.0.0 when I start Ipython notebook using the following command,it used to come in the running applications tab in master:8080 web UI. IPYTHON_OPTS=notebook --pylab inline

Difference between collect() and take(n)

2014-07-10 Thread MEETHU MATHEW
Hi all, I want to know how collect() works, and how it is different from take().I am just reading a file of 330MB which has 43lakh rows with 13 columns and calling take(430) to save to a variable.But the same is not working with collect().So is there any difference in the operation of

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-07-02 Thread MEETHU MATHEW
happen to set SPARK_PUBLIC_DNS or something of that kin? This error suggests the worker is trying to bind a server on the master's IP, which clearly doesn't make sense On Mon, Jun 30, 2014 at 11:59 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I did netstat -na | grep

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-07-01 Thread MEETHU MATHEW
ak...@sigmoidanalytics.com wrote: Are you sure you have this ip 192.168.125.174 bind for that machine? (netstat -na | grep 192.168.125.174) Thanks Best Regards On Mon, Jun 30, 2014 at 5:34 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi all, I reinstalled spark,reboot the system

Failed to launch Worker

2014-07-01 Thread MEETHU MATHEW
 Hi , I am using Spark Standalone mode with one master and 2 slaves.I am not  able to start the workers and connect it to the master using  ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077 The log says Exception in thread main

Re: Failed to launch Worker

2014-07-01 Thread MEETHU MATHEW
-Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077 Thanks Best Regards On Tue, Jul 1, 2014 at 6:08 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:  Hi , I am using Spark Standalone mode with one master and 2 slaves.I am not  able

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-06-30 Thread MEETHU MATHEW
org.apache.spark.deploy.worker.Worker spark://master:7077 Can somebody tell me  a solution.   Thanks Regards, Meethu M On Friday, 27 June 2014 4:28 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, ya I tried setting another PORT also,but the same problem.. master is set in etc/hosts   Thanks Regards, Meethu M

Re: How to control a spark application(executor) using memory amount per node?

2014-06-30 Thread MEETHU MATHEW
Hi, Try setting driver-java-options with spark-submit or set spark.executor.extraJavaOptions in spark-default.conf   Thanks Regards, Meethu M On Monday, 30 June 2014 1:28 PM, hansen han...@neusoft.com wrote: Hi, When i send the following statements in spark-shell:     val file =

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-06-27 Thread MEETHU MATHEW
to port 0? 192.168.125.174:0 :/ Check the ip address of that master machine (ifconfig) looks like the ip address has been changed (hoping you are running this machines on a LAN) Thanks Best Regards On Fri, Jun 27, 2014 at 12:00 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi all, My Spark

Re: join operation is taking too much time

2014-06-18 Thread MEETHU MATHEW
spark.shuffle.spill to false? 2014-06-17 5:59 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in:  Hi all, I want  to do a recursive leftOuterJoin between an RDD (created from  file) with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from 30 diff files in each iteration of a loop

options set in spark-env.sh is not reflecting on actual execution

2014-06-18 Thread MEETHU MATHEW
Hi all, I have a doubt regarding the options in spark-env.sh. I set the following values in the file in master and 2 workers SPARK_WORKER_MEMORY=7g SPARK_EXECUTOR_MEMORY=6g SPARK_DAEMON_JAVA_OPTS+=- Dspark.akka.timeout=30 -Dspark.akka.frameSize=1 -Dspark.blockManagerHeartBeatMs=80

join operation is taking too much time

2014-06-17 Thread MEETHU MATHEW
 Hi all, I want  to do a recursive leftOuterJoin between an RDD (created from  file) with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from 30 diff files in each iteration of a loop) varying from 1 to 6 million rows. When I run it for 5 RDDs,its running successfully  in

Re: Wildcard support in input path

2014-06-17 Thread MEETHU MATHEW
Hi Jianshi, I have used wild card characters (*) in my program and it worked.. My code was like this b = sc.textFile(hdfs:///path to file/data_file_2013SEP01*)   Thanks Regards, Meethu M On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: It would be

ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW
Hi, I am getting ArrayIndexOutOfBoundsException while reading from bz2 files  in HDFS.I have come across the same issue in JIRA at  https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be resolved.  I have tried the workaround suggested(SPARK_WORKER_CORES=1),but its still showing

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW
the piece of code!? Thanks Best Regards On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I am getting ArrayIndexOutOfBoundsException while reading from bz2 files  in HDFS.I have come across the same issue in JIRA at  https://issues.apache.org/jira/browse/SPARK

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW
, 2014 at 7:54 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I am getting ArrayIndexOutOfBoundsException while reading from bz2 files  in HDFS.I have come across the same issue in JIRA at https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be resolved. I have tried

How to stop a running SparkContext in the proper way?

2014-06-03 Thread MEETHU MATHEW
Hi, I want to know how I can stop a running SparkContext in a proper way so that next time when I start a new SparkContext, the web UI can be launched on the same port 4040.Now when i quit the job using ctrl+z the new sc are launched in new ports. I have the same problem with ipython