from:"MEETHU MATHEW"

Re: Gaussian Mixture Model clustering

2014-09-21 Thread Meethu Mathew

at 10:38 PM, Meethu Mathew meethu.mat...@flytxt.com mailto:meethu.mat...@flytxt.com wrote: Hi all, Please find attached the image of benchmark results. The table in the previous mail got messed up. Thanks. On Friday 19 September 2014 10:55 AM, Meethu Mathew wrote: Hi all

[jira] [Created] (SPARK-3588) Gaussian Mixture Model clustering

2014-09-18 Thread Meethu Mathew (JIRA)

Meethu Mathew created SPARK-3588: Summary: Gaussian Mixture Model clustering Key: SPARK-3588 URL: https://issues.apache.org/jira/browse/SPARK-3588 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-3588) Gaussian Mixture Model clustering

2014-09-18 Thread Meethu Mathew (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meethu Mathew updated SPARK-3588: - Description: Gaussian Mixture Models (GMM) is a popular technique for soft clustering. GMM

[jira] [Updated] (SPARK-3588) Gaussian Mixture Model clustering

2014-09-18 Thread Meethu Mathew (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meethu Mathew updated SPARK-3588: - Attachment: GMMSpark.py Gaussian Mixture Model clustering

[jira] [Commented] (SPARK-3588) Gaussian Mixture Model clustering

2014-09-18 Thread Meethu Mathew (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138782#comment-14138782 ] Meethu Mathew commented on SPARK-3588: -- We are interested in contributing

Gaussian Mixture Model clustering

2014-09-18 Thread Meethu Mathew

. -- Regards, *Meethu Mathew* *Engineer* *Flytxt* F: +91 471.2700202 www.flytxt.com | Visit our blog http://blog.flytxt.com/ | Follow us http://www.twitter.com/flytxt | _Connect on Linkedin http://www.linkedin.com/home?trk=hb_tab_home_top_

Python version of kmeans

2014-09-17 Thread MEETHU MATHEW

Hi all, I need the kmeans code written against Pyspark for some testing purpose. Can somebody tell me the difference between these two files. spark-1.0.1/examples/src/main/python/kmeans.py and spark-1.0.1/python/pyspark/mllib/clustering.py Thanks Regards, Meethu M

Re: how to specify columns in groupby

2014-08-29 Thread MEETHU MATHEW

d2 = d1.reduceByKey(_+_) d2.foreach(println) 2014-08-28 20:04 GMT+08:00 MEETHU MATHEW meethu2...@yahoo.co.in: Hi all, I have an RDD which has values in the format id,date,cost. I want to group the elements based on the id and date columns and get the sum of the cost for each group

how to specify columns in groupby

2014-08-28 Thread MEETHU MATHEW

Hi all, I have an RDD which has values in the format id,date,cost. I want to group the elements based on the id and date columns and get the sum of the cost for each group. Can somebody tell me how to do this? Thanks Regards, Meethu M

Re: Losing Executors on cluster with RDDs of 100GB

2014-08-26 Thread MEETHU MATHEW

Hi, Plz give a try by changing the worker memory such that worker memoryexecutor memory Thanks Regards, Meethu M On Friday, 22 August 2014 5:18 PM, Yadid Ayzenberg ya...@media.mit.edu wrote: Hi all, I have a spark cluster of 30 machines, 16GB / 8 cores on each running in standalone

Re: OutOfMemory Error

2014-08-20 Thread MEETHU MATHEW

Hi , How to increase the heap size? What is the difference between spark executor memory and heap size? Thanks Regards, Meethu M On Monday, 18 August 2014 12:35 PM, Akhil Das ak...@sigmoidanalytics.com wrote: I believe spark.shuffle.memoryFraction is the one you are looking for.

Use of SPARK_DAEMON_JAVA_OPTS

2014-07-23 Thread MEETHU MATHEW

Hi all, Sorry for taking this topic again,still I am confused on this. I set SPARK_DAEMON_JAVA_OPTS=-XX:+UseCompressedOops -Xmx8g when I run my application,I got the following line in logs. Spark Command: java -cp

Re: Error with spark-submit (formatting corrected)

2014-07-18 Thread MEETHU MATHEW

Hi, Instead of spark://10.1.3.7:7077 use spark://vmsparkwin1:7077 try this $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://vmsparkwin1:7077 --executor-memory 1G --total-executor-cores 2 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 10 Thanks Regards, Meethu M

Re: Pysparkshell are not listing in the web UI while running

2014-07-17 Thread MEETHU MATHEW

17, 2014 at 1:35 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi all, I just upgraded to spark 1.0.1. In spark 1.0.0 when I start Ipython notebook using the following command,it used to come in the running applications tab in master:8080 web UI. IPYTHON_OPTS=notebook --pylab inline

Difference between collect() and take(n)

2014-07-10 Thread MEETHU MATHEW

Hi all, I want to know how collect() works, and how it is different from take().I am just reading a file of 330MB which has 43lakh rows with 13 columns and calling take(430) to save to a variable.But the same is not working with collect().So is there any difference in the operation of

Contribution to MLlib

2014-07-09 Thread MEETHU MATHEW

Hi, I am interested in contributing a clustering algorithm towards MLlib of Spark.I am focusing on Gaussian Mixture Model. But I saw a JIRA @ https://spark-project.atlassian.net/browse/SPARK-952 regrading the same.I would like to know whether Gaussian Mixture Model is already implemented or

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-07-02 Thread MEETHU MATHEW

happen to set SPARK_PUBLIC_DNS or something of that kin? This error suggests the worker is trying to bind a server on the master's IP, which clearly doesn't make sense On Mon, Jun 30, 2014 at 11:59 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I did netstat -na | grep

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-07-01 Thread MEETHU MATHEW

ak...@sigmoidanalytics.com wrote: Are you sure you have this ip 192.168.125.174 bind for that machine? (netstat -na | grep 192.168.125.174) Thanks Best Regards On Mon, Jun 30, 2014 at 5:34 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi all, I reinstalled spark,reboot the system

Failed to launch Worker

2014-07-01 Thread MEETHU MATHEW

Hi , I am using Spark Standalone mode with one master and 2 slaves.I am not able to start the workers and connect it to the master using ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077 The log says Exception in thread main

Re: Failed to launch Worker

2014-07-01 Thread MEETHU MATHEW

-Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077 Thanks Best Regards On Tue, Jul 1, 2014 at 6:08 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi , I am using Spark Standalone mode with one master and 2 slaves.I am not able

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-06-30 Thread MEETHU MATHEW

org.apache.spark.deploy.worker.Worker spark://master:7077 Can somebody tell me a solution. Thanks Regards, Meethu M On Friday, 27 June 2014 4:28 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, ya I tried setting another PORT also,but the same problem.. master is set in etc/hosts Thanks Regards, Meethu M

Re: How to control a spark application(executor) using memory amount per node?

2014-06-30 Thread MEETHU MATHEW

Hi, Try setting driver-java-options with spark-submit or set spark.executor.extraJavaOptions in spark-default.conf Thanks Regards, Meethu M On Monday, 30 June 2014 1:28 PM, hansen han...@neusoft.com wrote: Hi, When i send the following statements in spark-shell: val file =

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-06-27 Thread MEETHU MATHEW

to port 0? 192.168.125.174:0 :/ Check the ip address of that master machine (ifconfig) looks like the ip address has been changed (hoping you are running this machines on a LAN) Thanks Best Regards On Fri, Jun 27, 2014 at 12:00 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi all, My Spark

Re: join operation is taking too much time

2014-06-18 Thread MEETHU MATHEW

spark.shuffle.spill to false? 2014-06-17 5:59 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in: Hi all, I want to do a recursive leftOuterJoin between an RDD (created from file) with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from 30 diff files in each iteration of a loop

options set in spark-env.sh is not reflecting on actual execution

2014-06-18 Thread MEETHU MATHEW

Hi all, I have a doubt regarding the options in spark-env.sh. I set the following values in the file in master and 2 workers SPARK_WORKER_MEMORY=7g SPARK_EXECUTOR_MEMORY=6g SPARK_DAEMON_JAVA_OPTS+=- Dspark.akka.timeout=30 -Dspark.akka.frameSize=1 -Dspark.blockManagerHeartBeatMs=80

join operation is taking too much time

2014-06-17 Thread MEETHU MATHEW

Hi all, I want to do a recursive leftOuterJoin between an RDD (created from file) with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from 30 diff files in each iteration of a loop) varying from 1 to 6 million rows. When I run it for 5 RDDs,its running successfully in

Re: Wildcard support in input path

2014-06-17 Thread MEETHU MATHEW

Hi Jianshi, I have used wild card characters (*) in my program and it worked.. My code was like this b = sc.textFile(hdfs:///path to file/data_file_2013SEP01*) Thanks Regards, Meethu M On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: It would be

ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW

Hi, I am getting ArrayIndexOutOfBoundsException while reading from bz2 files in HDFS.I have come across the same issue in JIRA at https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be resolved. I have tried the workaround suggested(SPARK_WORKER_CORES=1),but its still showing

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW

the piece of code!? Thanks Best Regards On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I am getting ArrayIndexOutOfBoundsException while reading from bz2 files in HDFS.I have come across the same issue in JIRA at https://issues.apache.org/jira/browse/SPARK

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW

, 2014 at 7:54 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I am getting ArrayIndexOutOfBoundsException while reading from bz2 files in HDFS.I have come across the same issue in JIRA at https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be resolved. I have tried

How to stop a running SparkContext in the proper way?

2014-06-03 Thread MEETHU MATHEW

Hi, I want to know how I can stop a running SparkContext in a proper way so that next time when I start a new SparkContext, the web UI can be launched on the same port 4040.Now when i quit the job using ctrl+z the new sc are launched in new ports. I have the same problem with ipython

Running KMeans with the new spark bindings

2014-05-22 Thread MEETHU MATHEW

Hi, I am a beginner in MAHOUT. I have run kmeans clustering for various datasets. Can anyone tell me how to run the kmeans in mahout with the new spark bindings. Thanks, Meethu M

Contributions to MLlib

2014-05-22 Thread MEETHU MATHEW

Hi, I would like to do some contributions towards the MLlib .I've a few concerns regarding the same. 1. Is there any reason for implementing the algorithms supported by MLlib in Scala 2. Will you accept if the contributions are done in Python or Java Thanks, Meethu M

Re: Gaussian Mixture Model clustering

[jira] [Created] (SPARK-3588) Gaussian Mixture Model clustering

[jira] [Updated] (SPARK-3588) Gaussian Mixture Model clustering

[jira] [Updated] (SPARK-3588) Gaussian Mixture Model clustering

[jira] [Commented] (SPARK-3588) Gaussian Mixture Model clustering

Gaussian Mixture Model clustering

Python version of kmeans

Re: how to specify columns in groupby

how to specify columns in groupby

Re: Losing Executors on cluster with RDDs of 100GB

Re: OutOfMemory Error

Use of SPARK_DAEMON_JAVA_OPTS

Re: Error with spark-submit (formatting corrected)

Re: Pysparkshell are not listing in the web UI while running

Difference between collect() and take(n)

Contribution to MLlib

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

Failed to launch Worker

Re: Failed to launch Worker

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

Re: How to control a spark application(executor) using memory amount per node?

Re: org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

Re: join operation is taking too much time

options set in spark-env.sh is not reflecting on actual execution

join operation is taking too much time

Re: Wildcard support in input path

ArrayIndexOutOfBoundsException when reading bzip2 files

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

How to stop a running SparkContext in the proper way?

Running KMeans with the new spark bindings

Contributions to MLlib

< 1 2

101 - 133 of 133 matches

Site Navigation

Mail list logo

Footer information