at 10:38 PM, Meethu Mathew
meethu.mat...@flytxt.com mailto:meethu.mat...@flytxt.com wrote:
Hi all,
Please find attached the image of benchmark results. The table in
the previous mail got messed up. Thanks.
On Friday 19 September 2014 10:55 AM, Meethu Mathew wrote:
Hi all
Meethu Mathew created SPARK-3588:
Summary: Gaussian Mixture Model clustering
Key: SPARK-3588
URL: https://issues.apache.org/jira/browse/SPARK-3588
Project: Spark
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/SPARK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Meethu Mathew updated SPARK-3588:
-
Description:
Gaussian Mixture Models (GMM) is a popular technique for soft clustering. GMM
[
https://issues.apache.org/jira/browse/SPARK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Meethu Mathew updated SPARK-3588:
-
Attachment: GMMSpark.py
Gaussian Mixture Model clustering
[
https://issues.apache.org/jira/browse/SPARK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138782#comment-14138782
]
Meethu Mathew commented on SPARK-3588:
--
We are interested in contributing
.
--
Regards,
*Meethu Mathew*
*Engineer*
*Flytxt*
F: +91 471.2700202
www.flytxt.com | Visit our blog http://blog.flytxt.com/ | Follow us
http://www.twitter.com/flytxt | _Connect on Linkedin
http://www.linkedin.com/home?trk=hb_tab_home_top_
Hi all,
I need the kmeans code written against Pyspark for some testing purpose.
Can somebody tell me the difference between these two files.
spark-1.0.1/examples/src/main/python/kmeans.py and
spark-1.0.1/python/pyspark/mllib/clustering.py
Thanks Regards,
Meethu M
d2 = d1.reduceByKey(_+_)
d2.foreach(println)
2014-08-28 20:04 GMT+08:00 MEETHU MATHEW meethu2...@yahoo.co.in:
Hi all,
I have an RDD which has values in the format id,date,cost.
I want to group the elements based on the id and date columns and get the sum
of the cost for each group
Hi all,
I have an RDD which has values in the format id,date,cost.
I want to group the elements based on the id and date columns and get the sum
of the cost for each group.
Can somebody tell me how to do this?
Thanks Regards,
Meethu M
Hi,
Plz give a try by changing the worker memory such that worker memoryexecutor
memory
Thanks Regards,
Meethu M
On Friday, 22 August 2014 5:18 PM, Yadid Ayzenberg ya...@media.mit.edu wrote:
Hi all,
I have a spark cluster of 30 machines, 16GB / 8 cores on each running in
standalone
Hi ,
How to increase the heap size?
What is the difference between spark executor memory and heap size?
Thanks Regards,
Meethu M
On Monday, 18 August 2014 12:35 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
I believe spark.shuffle.memoryFraction is the one you are looking for.
Hi all,
Sorry for taking this topic again,still I am confused on this.
I set SPARK_DAEMON_JAVA_OPTS=-XX:+UseCompressedOops -Xmx8g
when I run my application,I got the following line in logs.
Spark Command: java -cp
Hi,
Instead of spark://10.1.3.7:7077 use spark://vmsparkwin1:7077 try this
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
spark://vmsparkwin1:7077 --executor-memory 1G --total-executor-cores 2
./lib/spark-examples-1.0.0-hadoop2.2.0.jar 10
Thanks Regards,
Meethu M
17, 2014 at 1:35 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi all,
I just upgraded to spark 1.0.1. In spark 1.0.0 when I start Ipython notebook
using the following command,it used to come in the running applications tab in
master:8080 web UI.
IPYTHON_OPTS=notebook --pylab inline
Hi all,
I want to know how collect() works, and how it is different from take().I am
just reading a file of 330MB which has 43lakh rows with 13 columns and calling
take(430) to save to a variable.But the same is not working with
collect().So is there any difference in the operation of
Hi,
I am interested in contributing a clustering algorithm towards MLlib of Spark.I
am focusing on Gaussian Mixture Model.
But I saw a JIRA @ https://spark-project.atlassian.net/browse/SPARK-952
regrading the same.I would like to know whether Gaussian Mixture Model is
already implemented or
happen to set SPARK_PUBLIC_DNS or something of
that kin? This error suggests the worker is trying to bind a server on the
master's IP, which clearly doesn't make sense
On Mon, Jun 30, 2014 at 11:59 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi,
I did netstat -na | grep
ak...@sigmoidanalytics.com wrote:
Are you sure you have this ip 192.168.125.174 bind for that machine? (netstat
-na | grep 192.168.125.174)
Thanks
Best Regards
On Mon, Jun 30, 2014 at 5:34 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi all,
I reinstalled spark,reboot the system
Hi ,
I am using Spark Standalone mode with one master and 2 slaves.I am not able to
start the workers and connect it to the master using
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077
The log says
Exception in thread main
-Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m
org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077
Thanks
Best Regards
On Tue, Jul 1, 2014 at 6:08 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi ,
I am using Spark Standalone mode with one master and 2 slaves.I am not able
org.apache.spark.deploy.worker.Worker spark://master:7077
Can somebody tell me a solution.
Thanks Regards,
Meethu M
On Friday, 27 June 2014 4:28 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi,
ya I tried setting another PORT also,but the same problem..
master is set in etc/hosts
Thanks Regards,
Meethu M
Hi,
Try setting driver-java-options with spark-submit or set
spark.executor.extraJavaOptions in spark-default.conf
Thanks Regards,
Meethu M
On Monday, 30 June 2014 1:28 PM, hansen han...@neusoft.com wrote:
Hi,
When i send the following statements in spark-shell:
val file =
to port 0? 192.168.125.174:0 :/
Check the ip address of that master machine (ifconfig) looks like the ip
address has been changed (hoping you are running this machines on a LAN)
Thanks
Best Regards
On Fri, Jun 27, 2014 at 12:00 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi all,
My Spark
spark.shuffle.spill to false?
2014-06-17 5:59 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in:
Hi all,
I want to do a recursive leftOuterJoin between an RDD (created from file)
with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from
30 diff files in each iteration of a loop
Hi all,
I have a doubt regarding the options in spark-env.sh. I set the following
values in the file in master and 2 workers
SPARK_WORKER_MEMORY=7g
SPARK_EXECUTOR_MEMORY=6g
SPARK_DAEMON_JAVA_OPTS+=- Dspark.akka.timeout=30
-Dspark.akka.frameSize=1 -Dspark.blockManagerHeartBeatMs=80
Hi all,
I want to do a recursive leftOuterJoin between an RDD (created from file)
with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from
30 diff files in each iteration of a loop) varying from 1 to 6 million rows.
When I run it for 5 RDDs,its running successfully in
Hi Jianshi,
I have used wild card characters (*) in my program and it worked..
My code was like this
b = sc.textFile(hdfs:///path to file/data_file_2013SEP01*)
Thanks Regards,
Meethu M
On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
It would be
Hi,
I am getting ArrayIndexOutOfBoundsException while reading from bz2 files in
HDFS.I have come across the same issue in JIRA at
https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be resolved.
I have tried the workaround suggested(SPARK_WORKER_CORES=1),but its still
showing
the piece of code!?
Thanks
Best Regards
On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi,
I am getting ArrayIndexOutOfBoundsException while reading from bz2 files in
HDFS.I have come across the same issue in JIRA at
https://issues.apache.org/jira/browse/SPARK
, 2014 at 7:54 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi,
I am getting ArrayIndexOutOfBoundsException while reading from bz2 files in
HDFS.I have come across the same issue in JIRA at
https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be
resolved. I have tried
Hi,
I want to know how I can stop a running SparkContext in a proper way so that
next time when I start a new SparkContext, the web UI can be launched on the
same port 4040.Now when i quit the job using ctrl+z the new sc are launched in
new ports.
I have the same problem with ipython
Hi,
I am a beginner in MAHOUT. I have run kmeans clustering for various datasets.
Can anyone tell me how to run the kmeans in mahout with the new spark bindings.
Thanks,
Meethu M
Hi,
I would like to do some contributions towards the MLlib .I've a few concerns
regarding the same.
1. Is there any reason for implementing the algorithms supported by MLlib in
Scala
2. Will you accept if the contributions are done in Python or Java
Thanks,
Meethu M
101 - 133 of 133 matches
Mail list logo