[jira] [Created] (SPARK-38662) Spark looses k8s auth after some time
Alex created SPARK-38662: Summary: Spark looses k8s auth after some time Key: SPARK-38662 URL: https://issues.apache.org/jira/browse/SPARK-38662 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.2.1 Reporter: Alex Spark starts to fail with error listed below after some time of working: {noformat} [2022-03-25 17:11:12,706] INFO (Logging.scala:57) - Adding decommission script to lifecycle [2022-03-25 17:11:12,712] WARN (Logging.scala:90) - Exception when notifying snapshot subscriber. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://cluster_endpoint/api/v1/namespaces/spark/pods. Message: Unauthorized! Token may have expired! Please log-in again. Unauth orized. at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:639) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:576) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:543) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:292) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:893) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:372) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:400) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:117) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInt ernal(ExecutorPodsSnapshotsStoreImpl.scala:138) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$addSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:81) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834){noformat} This doesn't reproduce on 3.1.1 with the same configs, environment and workload. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac
[ https://issues.apache.org/jira/browse/SPARK-29077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] alex updated SPARK-29077: - Description: When creating a spark context running on a local host getting connection refused error, however using the actual host name for example: "spark://myhostname.local:7077" works {code:java} performance-meter { spark { appname = "test-harness" master = "spark://localhost:7077" } } {code} {code:java} val configRoot = "performance-meter" val sparkSession = SparkSession.builder .appName(conf.getString(s"${configRoot}.spark.appname")) .master(conf.getString(s"${configRoot}.spark.master")) {code} This appears to be due to some Macs having multiple network interfaces, at least is the case on my Mac. Recommended fix that seems to work locally: in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case section for "Darwin" {code:bash} if [ "$SPARK_MASTER_HOST" = "" ]; then case `uname` in (SunOS) SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" ;; (Darwin) SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) case to ensure spark binds on local host interface instead of external interface. ;; (*) SPARK_MASTER_HOST="`hostname -f`" ;; esac fi {code} was: When creating a spark context running on a local host getting connection refused error, however using the actual host name for example: "spark://myhostname.local:7077" works {code:java} performance-meter { spark { appname = "test-harness" master = "spark://localhost:7077" } } {code} {code:java} val configRoot = "performance-meter" val sparkSession = SparkSession.builder .appName(conf.getString(s"${configRoot}.spark.appname")) .master(conf.getString(s"${configRoot}.spark.master")) {code} This appears to be due to some Macs having multiple network interfaces, at least is the case on my Mac. Recommended fix that seems to work locally: in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case section for "Darwin" {code:java} if [ "$SPARK_MASTER_HOST" = "" ]; then case `uname` in (SunOS) SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" ;; (Darwin) SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) case to ensure spark binds on local host interface instead of external interface. ;; (*) SPARK_MASTER_HOST="`hostname -f`" ;; esac fi {code} > submitting a SparkSession job fails on a spark://localhost:7077 url on Mac > --- > > Key: SPARK-29077 > URL: https://issues.apache.org/jira/browse/SPARK-29077 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.4.3 > Environment: Darwin 18.7.0 Darwin Kernel Version > 18.7.0: Thu Jun 20 > PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64 >Reporter: alex >Priority: Major > > When creating a spark context running on a local host getting connection > refused error, however using the actual host name for example: > "spark://myhostname.local:7077" works > {code:java} > performance-meter { > spark { > appname = "test-harness" > master = "spark://localhost:7077" > } > } {code} > {code:java} > val configRoot = "performance-meter" > val sparkSession = SparkSession.builder > .appName(conf.getString(s"${configRoot}.spark.appname")) > .master(conf.getString(s"${configRoot}.spark.master")) {code} > > This appears to be due to some Macs having multiple network interfaces, at > least is the case on my Mac. Recommended fix that seems to work locally: > in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & > /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case > section for "Darwin" > {code:bash} > if [ "$SPARK_MASTER_HOST" = "" ]; then > case `uname` in > (SunOS) > SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" > ;; > (Darwin) > SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) > case to ensure spark binds on local host interface instead of external > interface. > ;; > (*) > SPARK_MASTER_HOST="`hostname -f`" > ;; > esac > fi > {code} > > > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac
[ https://issues.apache.org/jira/browse/SPARK-29077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] alex updated SPARK-29077: - Description: When creating a spark context running on a local host getting connection refused error, however using the actual host name for example: "spark://myhostname.local:7077" works {code:java} performance-meter { spark { appname = "test-harness" master = "spark://localhost:7077" } } {code} {code:java} val configRoot = "performance-meter" val sparkSession = SparkSession.builder .appName(conf.getString(s"${configRoot}.spark.appname")) .master(conf.getString(s"${configRoot}.spark.master")) {code} This appears to be due to some Macs having multiple network interfaces, at least is the case on my Mac. Recommended fix that seems to work locally: in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case section for "Darwin" {code:java} if [ "$SPARK_MASTER_HOST" = "" ]; then case `uname` in (SunOS) SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" ;; (Darwin) SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) case to ensure spark binds on local host interface instead of external interface. ;; (*) SPARK_MASTER_HOST="`hostname -f`" ;; esac fi {code} was: When creating a spark context running on a local host getting connection refused error, however using the actual host name for example: "spark://myhostname.local:7077" works {code:java} performance-meter { spark { appname = "test-harness" master = "spark://localhost:7077" } } {code} {code:java} val configRoot = "performance-meter" val sparkSession = SparkSession.builder .appName(conf.getString(s"${configRoot}.spark.appname")) .master(conf.getString(s"${configRoot}.spark.master")) {code} This appears to be due to some Macs having multiple network interfaces, at least is the case on my Mac. Recommended fix that seems to work locally: in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case section for "Darwin" {code:java} if [ "$SPARK_MASTER_HOST" = "" ]; then case `uname` in (SunOS) SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" ;; (Darwin) SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) case to ensure spark binds on local host interface instead of external port. ;; (*) SPARK_MASTER_HOST="`hostname -f`" ;; esac fi {code} > submitting a SparkSession job fails on a spark://localhost:7077 url on Mac > --- > > Key: SPARK-29077 > URL: https://issues.apache.org/jira/browse/SPARK-29077 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.4.3 > Environment: Darwin 18.7.0 Darwin Kernel Version > 18.7.0: Thu Jun 20 > PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64 >Reporter: alex >Priority: Major > > When creating a spark context running on a local host getting connection > refused error, however using the actual host name for example: > "spark://myhostname.local:7077" works > {code:java} > performance-meter { > spark { > appname = "test-harness" > master = "spark://localhost:7077" > } > } {code} > {code:java} > val configRoot = "performance-meter" > val sparkSession = SparkSession.builder > .appName(conf.getString(s"${configRoot}.spark.appname")) > .master(conf.getString(s"${configRoot}.spark.master")) {code} > > This appears to be due to some Macs having multiple network interfaces, at > least is the case on my Mac. Recommended fix that seems to work locally: > in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & > /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case > section for "Darwin" > {code:java} > if [ "$SPARK_MASTER_HOST" = "" ]; then > case `uname` in > (SunOS) > SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" > ;; > (Darwin) > SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) > case to ensure spark binds on local host interface instead of external > interface. > ;; > (*) > SPARK_MASTER_HOST="`hostname -f`" > ;; > esac > fi > {code} > > > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac
[ https://issues.apache.org/jira/browse/SPARK-29077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] alex updated SPARK-29077: - Description: When creating a spark context running on a local host getting connection refused error, however using the actual host name for example: "spark://myhostname.local:7077" works {code:java} performance-meter { spark { appname = "test-harness" master = "spark://localhost:7077" } } {code} {code:java} val configRoot = "performance-meter" val sparkSession = SparkSession.builder .appName(conf.getString(s"${configRoot}.spark.appname")) .master(conf.getString(s"${configRoot}.spark.master")) {code} This appears to be due to some Macs having multiple network interfaces, at least is the case on my Mac. Recommended fix that seems to work locally: in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case section for "Darwin" {code:java} if [ "$SPARK_MASTER_HOST" = "" ]; then case `uname` in (SunOS) SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" ;; (Darwin) SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) case to ensure spark binds on local host interface instead of external port. ;; (*) SPARK_MASTER_HOST="`hostname -f`" ;; esac fi {code} was: When creating a spark context running on a local host getting connection refused error, however using the actual host name for example: "spark://myhostname.local:7077" works {code:java} performance-meter { spark { appname = "test-harness" master = "spark://localhost:7077" } } {code} {code:java} val configRoot = "performance-meter" val sparkSession = SparkSession.builder .appName(conf.getString(s"${configRoot}.spark.appname")) .master(conf.getString(s"${configRoot}.spark.master")) {code} This appears to be due to some Macs having multiple network interfaces, at least is the case on my Mac. Recommended fix that seems to work locally: in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a section > submitting a SparkSession job fails on a spark://localhost:7077 url on Mac > --- > > Key: SPARK-29077 > URL: https://issues.apache.org/jira/browse/SPARK-29077 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.4.3 > Environment: Darwin 18.7.0 Darwin Kernel Version > 18.7.0: Thu Jun 20 > PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64 >Reporter: alex >Priority: Major > > When creating a spark context running on a local host getting connection > refused error, however using the actual host name for example: > "spark://myhostname.local:7077" works > {code:java} > performance-meter { > spark { > appname = "test-harness" > master = "spark://localhost:7077" > } > } {code} > {code:java} > val configRoot = "performance-meter" > val sparkSession = SparkSession.builder > .appName(conf.getString(s"${configRoot}.spark.appname")) > .master(conf.getString(s"${configRoot}.spark.master")) {code} > > This appears to be due to some Macs having multiple network interfaces, at > least is the case on my Mac. Recommended fix that seems to work locally: > in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & > /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case > section for "Darwin" > {code:java} > if [ "$SPARK_MASTER_HOST" = "" ]; then > case `uname` in > (SunOS) > SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" > ;; > (Darwin) > SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) > case to ensure spark binds on local host interface instead of external port. > ;; > (*) > SPARK_MASTER_HOST="`hostname -f`" > ;; > esac > fi > {code} > > > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac
alex created SPARK-29077: Summary: submitting a SparkSession job fails on a spark://localhost:7077 url on Mac Key: SPARK-29077 URL: https://issues.apache.org/jira/browse/SPARK-29077 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.4.3 Environment: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Thu Jun 20 PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64 Reporter: alex When creating a spark context running on a local host getting connection refused error, however using the actual host name for example: "spark://myhostname.local:7077" works {code:java} performance-meter { spark { appname = "test-harness" master = "spark://localhost:7077" } } {code} {code:java} val configRoot = "performance-meter" val sparkSession = SparkSession.builder .appName(conf.getString(s"${configRoot}.spark.appname")) .master(conf.getString(s"${configRoot}.spark.master")) {code} This appears to be due to some Macs having multiple network interfaces, at least is the case on my Mac. Recommended fix that seems to work locally: in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a section -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943040#comment-14943040 ] Alex commented on SPARK-2344: - Hi, Sorry it took me a lot of time to get back to you. I'm not yet finished the code, and it should be done in a week or so. I'll send you the finished version of it. BTW, how did you tested your version of FCM? Is threre a data set that you've used? Alex > Add Fuzzy C-Means algorithm to MLlib > > > Key: SPARK-2344 > URL: https://issues.apache.org/jira/browse/SPARK-2344 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Alex >Priority: Minor > Labels: clustering > Original Estimate: 1m > Remaining Estimate: 1m > > I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. > FCM is very similar to K - Means which is already implemented, and they > differ only in the degree of relationship each point has with each cluster: > (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. > As part of the implementation I would like: > - create a base class for K- Means and FCM > - implement the relationship for each algorithm differently (in its class) > I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577599#comment-14577599 ] Alex commented on SPARK-2344: - Hi guys, What is the status of this issue? Beniamino - are you planning to submit you version of the algorithm? Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Labels: clustering Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes
[ https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550637#comment-14550637 ] Alex commented on SPARK-6246: - This can be fixed by replacing the line in file ec2/spark_ec2.py statuses = conn.get_all_instance_status(instance_ids=[i.id for i in cluster_instances]) with the lines: max_batch = 100 statuses = [] for j in range((len(cluster_instances) + max_batch - 1) // max_batch): statuses.extend(conn.get_all_instance_status(instance_ids=[i.id for i in cluster_instances[j * max_batch:(j + 1) * max_batch]])) spark-ec2 can't handle clusters with 100 nodes Key: SPARK-6246 URL: https://issues.apache.org/jira/browse/SPARK-6246 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.3.0 Reporter: Nicholas Chammas Priority: Minor This appears to be a new restriction, perhaps resulting from our upgrade of boto. Maybe it's a new restriction from EC2. Not sure yet. We didn't have this issue around the Spark 1.1.0 time frame from what I can remember. I'll track down where the issue is and when it started. Attempting to launch a cluster with 100 slaves yields the following: {code} Spark AMI: ami-35b1885c Launching instances... Launched 100 slaves in us-east-1c, regid = r-9c408776 Launched master in us-east-1c, regid = r-92408778 Waiting for AWS to propagate instance metadata... Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request ERROR:boto:?xml version=1.0 encoding=UTF-8? ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the maximum number of instance IDs that can be specificied (100). Please specify fewer than 100 instance IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response Traceback (most recent call last): File ./ec2/spark_ec2.py, line 1338, in module main() File ./ec2/spark_ec2.py, line 1330, in main real_main() File ./ec2/spark_ec2.py, line 1170, in real_main cluster_state='ssh-ready' File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state statuses = conn.get_all_instance_status(instance_ids=[i.id for i in cluster_instances]) File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 737, in get_all_instance_status InstanceStatusSet, verb='POST') File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 1204, in get_object raise self.ResponseError(response.status, response.reason, body) boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request ?xml version=1.0 encoding=UTF-8? ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the maximum number of instance IDs that can be specificied (100). Please specify fewer than 100 instance IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response {code} This problem seems to be with {{get_all_instance_status()}}, though I am not sure if other methods are affected too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes
[ https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551157#comment-14551157 ] Alex commented on SPARK-6246: - [~shivaram] Done. This is my first PR. Do I have to do anything else to contribute to this ticket? spark-ec2 can't handle clusters with 100 nodes Key: SPARK-6246 URL: https://issues.apache.org/jira/browse/SPARK-6246 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.3.0 Reporter: Nicholas Chammas Priority: Minor This appears to be a new restriction, perhaps resulting from our upgrade of boto. Maybe it's a new restriction from EC2. Not sure yet. We didn't have this issue around the Spark 1.1.0 time frame from what I can remember. I'll track down where the issue is and when it started. Attempting to launch a cluster with 100 slaves yields the following: {code} Spark AMI: ami-35b1885c Launching instances... Launched 100 slaves in us-east-1c, regid = r-9c408776 Launched master in us-east-1c, regid = r-92408778 Waiting for AWS to propagate instance metadata... Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request ERROR:boto:?xml version=1.0 encoding=UTF-8? ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the maximum number of instance IDs that can be specificied (100). Please specify fewer than 100 instance IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response Traceback (most recent call last): File ./ec2/spark_ec2.py, line 1338, in module main() File ./ec2/spark_ec2.py, line 1330, in main real_main() File ./ec2/spark_ec2.py, line 1170, in real_main cluster_state='ssh-ready' File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state statuses = conn.get_all_instance_status(instance_ids=[i.id for i in cluster_instances]) File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 737, in get_all_instance_status InstanceStatusSet, verb='POST') File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 1204, in get_object raise self.ResponseError(response.status, response.reason, body) boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request ?xml version=1.0 encoding=UTF-8? ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the maximum number of instance IDs that can be specificied (100). Please specify fewer than 100 instance IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response {code} This problem seems to be with {{get_all_instance_status()}}, though I am not sure if other methods are affected too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545027#comment-14545027 ] Alex commented on SPARK-2344: - Hi, How are you? I have couple of questions: 1) When are you planning to submit the FCM to the main spark branch? (I'm interested working on top of it for Feature Weight FCM improvements) 2) How to know if there is a way for Spark to make the RDD distribution based on input data columns rather then rows ? Thanks, Alex Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Labels: clustering Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360560#comment-14360560 ] Alex commented on SPARK-2344: - I see, actually I was counting to make the implementation of this algorithm because it is a big part of my final project Maybe I can add an improvement of FCM such as http://www.csee.usf.edu/~manohar/Papers/Pancreas/Improving%20FCM%20learning%20based%20on%20feature%20weight%20learning.pdf what do you think? Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Labels: clustering Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351631#comment-14351631 ] Alex commented on SPARK-2344: - Hi, How are you doing? I went over you implementation and it looks good. As I understood, you don't use the membership matrix, but instead you calculate the value on the fly, is that correct? It seems that you're pretty much finished with the FCM - what is left? Where can I contribute? Alex Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Labels: clustering Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335577#comment-14335577 ] Alex commented on SPARK-2344: - You're right of course - i did not mean to leave it there (only for now). Sorry that I did not answered you earlier, I have a test of Thursday and I spend all my time studying After Thursday I will take a look at your code. Alex Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Labels: clustering Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330276#comment-14330276 ] Alex commented on SPARK-2344: - Thanks for the link. Maybe we could joint forces in order to make this algorithm work and be added to MLlib. Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322658#comment-14322658 ] Alex commented on SPARK-2344: - Hi, I'm also working on the implementation of FCM, You can find my work here: https://github.com/salexln/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/clustering Alex Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110342#comment-14110342 ] Alex commented on SPARK-2344: - Hi, I'm currently working on the implementation of FCM myself. Also see this: https://issues.apache.org/jira/browse/SPARK-2430 (JIRA for Standarized Clustering Algorithm API) Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110518#comment-14110518 ] Alex commented on SPARK-2344: - this is my branch: https://github.com/salexln/spark i do not added code there - will do it in the next few days... do you know if Standarized Clustering Algorithm API and Framework was already submitted? I'm interested in it as well. Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex updated SPARK-2344: Description: I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. was: I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) Priority: Minor (was: Major) Affects Version/s: (was: 1.0.0) Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.2#6252)