[jira] [Created] (SPARK-38662) Spark looses k8s auth after some time

2022-03-25 Thread Alex (Jira)
Alex created SPARK-38662:


 Summary: Spark looses k8s auth after some time
 Key: SPARK-38662
 URL: https://issues.apache.org/jira/browse/SPARK-38662
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.2.1
Reporter: Alex


Spark starts to fail with error listed below after some time of working:
{noformat}
[2022-03-25 17:11:12,706] INFO  (Logging.scala:57) - Adding decommission script 
to lifecycle                                                                    
                                                   
[2022-03-25 17:11:12,712] WARN  (Logging.scala:90) - Exception when notifying 
snapshot subscriber.                                                            
                                                     
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://cluster_endpoint/api/v1/namespaces/spark/pods. Message: 
Unauthorized! Token may have expired! Please log-in again. Unauth
orized.                                                                         
                                                                                
                                                   
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:639)
                                                                                
                        
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:576)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:543)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:292)
 
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:893)
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:372)
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:400)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:117)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInt
ernal(ExecutorPodsSnapshotsStoreImpl.scala:138)     
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126)
        at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$addSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:81)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834){noformat}

This doesn't reproduce on 3.1.1 with the same configs, environment and workload.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac

2019-09-13 Thread alex (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

alex updated SPARK-29077:
-
Description: 
When creating a spark context running on a local host getting connection 
refused error, however using the actual host name for example: 
"spark://myhostname.local:7077" works
{code:java}
performance-meter {
spark {
appname = "test-harness"
master = "spark://localhost:7077"
}
} {code}
{code:java}
val configRoot = "performance-meter"
val sparkSession = SparkSession.builder
  .appName(conf.getString(s"${configRoot}.spark.appname"))
  .master(conf.getString(s"${configRoot}.spark.master")) {code}
 

This appears to be due to some Macs having multiple network interfaces, at 
least is the case on my Mac. Recommended fix that seems to work locally:

in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
/usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
section for "Darwin"


{code:bash}
if [ "$SPARK_MASTER_HOST" = "" ]; then
  case `uname` in
  (SunOS)
SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
;;
 (Darwin)
SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
case to ensure spark binds on local host interface instead of external 
interface.
;;
  (*)
SPARK_MASTER_HOST="`hostname -f`"
;;
  esac
fi
{code}
 

 

 

 

  was:
When creating a spark context running on a local host getting connection 
refused error, however using the actual host name for example: 
"spark://myhostname.local:7077" works
{code:java}
performance-meter {
spark {
appname = "test-harness"
master = "spark://localhost:7077"
}
} {code}
{code:java}
val configRoot = "performance-meter"
val sparkSession = SparkSession.builder
  .appName(conf.getString(s"${configRoot}.spark.appname"))
  .master(conf.getString(s"${configRoot}.spark.master")) {code}
 

This appears to be due to some Macs having multiple network interfaces, at 
least is the case on my Mac. Recommended fix that seems to work locally:

in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
/usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
section for "Darwin"
{code:java}
if [ "$SPARK_MASTER_HOST" = "" ]; then
  case `uname` in
  (SunOS)
SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
;;
 (Darwin)
SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
case to ensure spark binds on local host interface instead of external 
interface.
;;
  (*)
SPARK_MASTER_HOST="`hostname -f`"
;;
  esac
fi
{code}
 

 

 

 


> submitting  a SparkSession job fails on a spark://localhost:7077 url on Mac
> ---
>
> Key: SPARK-29077
> URL: https://issues.apache.org/jira/browse/SPARK-29077
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.4.3
> Environment: Darwin  18.7.0 Darwin Kernel Version 
> 18.7.0: Thu Jun 20 
>  PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64
>Reporter: alex
>Priority: Major
>
> When creating a spark context running on a local host getting connection 
> refused error, however using the actual host name for example: 
> "spark://myhostname.local:7077" works
> {code:java}
> performance-meter {
> spark {
> appname = "test-harness"
> master = "spark://localhost:7077"
> }
> } {code}
> {code:java}
> val configRoot = "performance-meter"
> val sparkSession = SparkSession.builder
>   .appName(conf.getString(s"${configRoot}.spark.appname"))
>   .master(conf.getString(s"${configRoot}.spark.master")) {code}
>  
> This appears to be due to some Macs having multiple network interfaces, at 
> least is the case on my Mac. Recommended fix that seems to work locally:
> in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
> /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
> section for "Darwin"
> {code:bash}
> if [ "$SPARK_MASTER_HOST" = "" ]; then
>   case `uname` in
>   (SunOS)
> SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
> ;;
>  (Darwin)
> SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
> case to ensure spark binds on local host interface instead of external 
> interface.
> ;;
>   (*)
> SPARK_MASTER_HOST="`hostname -f`"
> ;;
>   esac
> fi
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac

2019-09-13 Thread alex (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

alex updated SPARK-29077:
-
Description: 
When creating a spark context running on a local host getting connection 
refused error, however using the actual host name for example: 
"spark://myhostname.local:7077" works
{code:java}
performance-meter {
spark {
appname = "test-harness"
master = "spark://localhost:7077"
}
} {code}
{code:java}
val configRoot = "performance-meter"
val sparkSession = SparkSession.builder
  .appName(conf.getString(s"${configRoot}.spark.appname"))
  .master(conf.getString(s"${configRoot}.spark.master")) {code}
 

This appears to be due to some Macs having multiple network interfaces, at 
least is the case on my Mac. Recommended fix that seems to work locally:

in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
/usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
section for "Darwin"
{code:java}
if [ "$SPARK_MASTER_HOST" = "" ]; then
  case `uname` in
  (SunOS)
SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
;;
 (Darwin)
SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
case to ensure spark binds on local host interface instead of external 
interface.
;;
  (*)
SPARK_MASTER_HOST="`hostname -f`"
;;
  esac
fi
{code}
 

 

 

 

  was:
When creating a spark context running on a local host getting connection 
refused error, however using the actual host name for example: 
"spark://myhostname.local:7077" works
{code:java}
performance-meter {
spark {
appname = "test-harness"
master = "spark://localhost:7077"
}
} {code}
{code:java}
val configRoot = "performance-meter"
val sparkSession = SparkSession.builder
  .appName(conf.getString(s"${configRoot}.spark.appname"))
  .master(conf.getString(s"${configRoot}.spark.master")) {code}
 

This appears to be due to some Macs having multiple network interfaces, at 
least is the case on my Mac. Recommended fix that seems to work locally:

in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
/usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
section for "Darwin"
{code:java}
if [ "$SPARK_MASTER_HOST" = "" ]; then
  case `uname` in
  (SunOS)
SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
;;
 (Darwin)
SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
case to ensure spark binds on local host interface instead of external port.
;;
  (*)
SPARK_MASTER_HOST="`hostname -f`"
;;
  esac
fi
{code}
 

 

 

 


> submitting  a SparkSession job fails on a spark://localhost:7077 url on Mac
> ---
>
> Key: SPARK-29077
> URL: https://issues.apache.org/jira/browse/SPARK-29077
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.4.3
> Environment: Darwin  18.7.0 Darwin Kernel Version 
> 18.7.0: Thu Jun 20 
>  PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64
>Reporter: alex
>Priority: Major
>
> When creating a spark context running on a local host getting connection 
> refused error, however using the actual host name for example: 
> "spark://myhostname.local:7077" works
> {code:java}
> performance-meter {
> spark {
> appname = "test-harness"
> master = "spark://localhost:7077"
> }
> } {code}
> {code:java}
> val configRoot = "performance-meter"
> val sparkSession = SparkSession.builder
>   .appName(conf.getString(s"${configRoot}.spark.appname"))
>   .master(conf.getString(s"${configRoot}.spark.master")) {code}
>  
> This appears to be due to some Macs having multiple network interfaces, at 
> least is the case on my Mac. Recommended fix that seems to work locally:
> in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
> /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
> section for "Darwin"
> {code:java}
> if [ "$SPARK_MASTER_HOST" = "" ]; then
>   case `uname` in
>   (SunOS)
> SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
> ;;
>  (Darwin)
> SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
> case to ensure spark binds on local host interface instead of external 
> interface.
> ;;
>   (*)
> SPARK_MASTER_HOST="`hostname -f`"
> ;;
>   esac
> fi
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac

2019-09-13 Thread alex (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

alex updated SPARK-29077:
-
Description: 
When creating a spark context running on a local host getting connection 
refused error, however using the actual host name for example: 
"spark://myhostname.local:7077" works
{code:java}
performance-meter {
spark {
appname = "test-harness"
master = "spark://localhost:7077"
}
} {code}
{code:java}
val configRoot = "performance-meter"
val sparkSession = SparkSession.builder
  .appName(conf.getString(s"${configRoot}.spark.appname"))
  .master(conf.getString(s"${configRoot}.spark.master")) {code}
 

This appears to be due to some Macs having multiple network interfaces, at 
least is the case on my Mac. Recommended fix that seems to work locally:

in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
/usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
section for "Darwin"
{code:java}
if [ "$SPARK_MASTER_HOST" = "" ]; then
  case `uname` in
  (SunOS)
SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
;;
 (Darwin)
SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
case to ensure spark binds on local host interface instead of external port.
;;
  (*)
SPARK_MASTER_HOST="`hostname -f`"
;;
  esac
fi
{code}
 

 

 

 

  was:
When creating a spark context running on a local host getting connection 
refused error, however using the actual host name for example: 
"spark://myhostname.local:7077" works
{code:java}
performance-meter {
spark {
appname = "test-harness"
master = "spark://localhost:7077"
}
} {code}
{code:java}
val configRoot = "performance-meter"
val sparkSession = SparkSession.builder
  .appName(conf.getString(s"${configRoot}.spark.appname"))
  .master(conf.getString(s"${configRoot}.spark.master")) {code}
 

This appears to be due to some Macs having multiple network interfaces, at 
least is the case on my Mac. Recommended fix that seems to work locally:

in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
/usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a section

 

 


> submitting  a SparkSession job fails on a spark://localhost:7077 url on Mac
> ---
>
> Key: SPARK-29077
> URL: https://issues.apache.org/jira/browse/SPARK-29077
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.4.3
> Environment: Darwin  18.7.0 Darwin Kernel Version 
> 18.7.0: Thu Jun 20 
>  PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64
>Reporter: alex
>Priority: Major
>
> When creating a spark context running on a local host getting connection 
> refused error, however using the actual host name for example: 
> "spark://myhostname.local:7077" works
> {code:java}
> performance-meter {
> spark {
> appname = "test-harness"
> master = "spark://localhost:7077"
> }
> } {code}
> {code:java}
> val configRoot = "performance-meter"
> val sparkSession = SparkSession.builder
>   .appName(conf.getString(s"${configRoot}.spark.appname"))
>   .master(conf.getString(s"${configRoot}.spark.master")) {code}
>  
> This appears to be due to some Macs having multiple network interfaces, at 
> least is the case on my Mac. Recommended fix that seems to work locally:
> in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
> /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a case 
> section for "Darwin"
> {code:java}
> if [ "$SPARK_MASTER_HOST" = "" ]; then
>   case `uname` in
>   (SunOS)
> SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
> ;;
>  (Darwin)
> SPARK_MASTER_HOST="localhost" # 13-Sep-2019 alexshagiev add Mac (Darwin) 
> case to ensure spark binds on local host interface instead of external port.
> ;;
>   (*)
> SPARK_MASTER_HOST="`hostname -f`"
> ;;
>   esac
> fi
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29077) submitting a SparkSession job fails on a spark://localhost:7077 url on Mac

2019-09-13 Thread alex (Jira)
alex created SPARK-29077:


 Summary: submitting  a SparkSession job fails on a 
spark://localhost:7077 url on Mac
 Key: SPARK-29077
 URL: https://issues.apache.org/jira/browse/SPARK-29077
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.4.3
 Environment: Darwin  18.7.0 Darwin Kernel Version 
18.7.0: Thu Jun 20 

 PDT 2019; root:xnu-4903.270.47~4/RELEASE_X86_64 x86_64
Reporter: alex


When creating a spark context running on a local host getting connection 
refused error, however using the actual host name for example: 
"spark://myhostname.local:7077" works
{code:java}
performance-meter {
spark {
appname = "test-harness"
master = "spark://localhost:7077"
}
} {code}
{code:java}
val configRoot = "performance-meter"
val sparkSession = SparkSession.builder
  .appName(conf.getString(s"${configRoot}.spark.appname"))
  .master(conf.getString(s"${configRoot}.spark.master")) {code}
 

This appears to be due to some Macs having multiple network interfaces, at 
least is the case on my Mac. Recommended fix that seems to work locally:

in file: /usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-master.sh & 
/usr/local/Cellar/apache-spark/2.4.3/libexec/sbin/start-slaves.sh add a section

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-10-05 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943040#comment-14943040
 ] 

Alex commented on SPARK-2344:
-

Hi,

Sorry it took me a lot of time to get back to you.
I'm not yet finished the code, and it should be done in a week or so. I'll
send you the finished version of it.

BTW, how did you tested your version of FCM? Is threre a data set that
you've used?


Alex




> Add Fuzzy C-Means algorithm to MLlib
> 
>
> Key: SPARK-2344
> URL: https://issues.apache.org/jira/browse/SPARK-2344
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Alex
>Priority: Minor
>  Labels: clustering
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
> FCM is very similar to K - Means which is already implemented, and they 
> differ only in the degree of relationship each point has with each cluster:
> (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
> As part of the implementation I would like:
> - create a base class for K- Means and FCM
> - implement the relationship for each algorithm differently (in its class)
> I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-06-08 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577599#comment-14577599
 ] 

Alex commented on SPARK-2344:
-

Hi guys,
What is the status of this issue? 

Beniamino - are you planning to submit you version of the algorithm?


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
  Labels: clustering
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-19 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550637#comment-14550637
 ] 

Alex commented on SPARK-6246:
-

This can be fixed by replacing the line in file ec2/spark_ec2.py

statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
cluster_instances])

with the lines:

max_batch = 100
statuses = []
for j in range((len(cluster_instances) + max_batch - 1) // max_batch):
statuses.extend(conn.get_all_instance_status(instance_ids=[i.id for 
i in cluster_instances[j * max_batch:(j + 1) * max_batch]]))

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6246) spark-ec2 can't handle clusters with 100 nodes

2015-05-19 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551157#comment-14551157
 ] 

Alex commented on SPARK-6246:
-

[~shivaram] Done. This is my first PR. Do I have to do anything else to 
contribute to this ticket?

 spark-ec2 can't handle clusters with  100 nodes
 

 Key: SPARK-6246
 URL: https://issues.apache.org/jira/browse/SPARK-6246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor

 This appears to be a new restriction, perhaps resulting from our upgrade of 
 boto. Maybe it's a new restriction from EC2. Not sure yet.
 We didn't have this issue around the Spark 1.1.0 time frame from what I can 
 remember. I'll track down where the issue is and when it started.
 Attempting to launch a cluster with 100 slaves yields the following:
 {code}
 Spark AMI: ami-35b1885c
 Launching instances...
 Launched 100 slaves in us-east-1c, regid = r-9c408776
 Launched master in us-east-1c, regid = r-92408778
 Waiting for AWS to propagate instance metadata...
 Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
 ERROR:boto:?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 Traceback (most recent call last):
   File ./ec2/spark_ec2.py, line 1338, in module
 main()
   File ./ec2/spark_ec2.py, line 1330, in main
 real_main()
   File ./ec2/spark_ec2.py, line 1170, in real_main
 cluster_state='ssh-ready'
   File ./ec2/spark_ec2.py, line 795, in wait_for_cluster_state
 statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
 cluster_instances])
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py, line 
 737, in get_all_instance_status
 InstanceStatusSet, verb='POST')
   File /path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py, line 
 1204, in get_object
 raise self.ResponseError(response.status, response.reason, body)
 boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
 ?xml version=1.0 encoding=UTF-8?
 ResponseErrorsErrorCodeInvalidRequest/CodeMessage101 exceeds the 
 maximum number of instance IDs that can be specificied (100). Please specify 
 fewer than 100 instance 
 IDs./Message/Error/ErrorsRequestID217fd6ff-9afa-4e91-86bc-ab16fcc442d8/RequestID/Response
 {code}
 This problem seems to be with {{get_all_instance_status()}}, though I am not 
 sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-05-15 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545027#comment-14545027
 ] 

Alex commented on SPARK-2344:
-

Hi,

How are you? I have couple of questions:

1) When are you planning to submit the FCM to the main spark branch? (I'm
interested working on top of it for Feature Weight FCM improvements)

2) How to know if there is a way for Spark to make the RDD distribution
based on input data columns rather then rows ?


​Thanks,
Alex


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
  Labels: clustering
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-03-13 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360560#comment-14360560
 ] 

Alex commented on SPARK-2344:
-

I see, actually I was counting to make the implementation of this algorithm
because it is a big part of my final project

Maybe I can add an improvement of FCM such as
http://www.csee.usf.edu/~manohar/Papers/Pancreas/Improving%20FCM%20learning%20based%20on%20feature%20weight%20learning.pdf

what do you think?
​


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
  Labels: clustering
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-03-07 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351631#comment-14351631
 ] 

Alex commented on SPARK-2344:
-

Hi,
How are you doing? I went over you implementation and it looks good. As I
understood, you don't use the membership matrix, but instead you calculate
the value on the fly, is that correct?

​
It seems that you're pretty much finished with the FCM - what is left?
Where can I contribute?


Alex


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
  Labels: clustering
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-02-24 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335577#comment-14335577
 ] 

Alex commented on SPARK-2344:
-

You're right of course - i did not mean to leave it there (only for now).

Sorry that I did not answered you earlier, I have a test of Thursday and I
spend all my  time studying After Thursday I will take a look at your
code.



Alex




 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
  Labels: clustering
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-02-21 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330276#comment-14330276
 ] 

Alex commented on SPARK-2344:
-

Thanks for the link.
Maybe we could joint forces in order to make this algorithm work and be added 
to MLlib.


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2015-02-16 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322658#comment-14322658
 ] 

Alex commented on SPARK-2344:
-

Hi,

I'm also working on the implementation of FCM,
You can find my work here:
https://github.com/salexln/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/clustering



Alex


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2014-08-26 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110342#comment-14110342
 ] 

Alex commented on SPARK-2344:
-

Hi,
I'm currently working on the implementation of FCM myself.
Also see this: https://issues.apache.org/jira/browse/SPARK-2430
(JIRA for Standarized Clustering Algorithm API)

 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2014-08-26 Thread Alex (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110518#comment-14110518
 ] 

Alex commented on SPARK-2344:
-

this is my branch: https://github.com/salexln/spark
i do not added code there - will do it in the next few days...


do you know if Standarized Clustering Algorithm API and Framework was
already submitted? I'm interested in it as well.


 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib

2014-07-03 Thread Alex (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex updated SPARK-2344:


  Description: 
I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.

FCM is very similar to K - Means which is already implemented, and they differ 
only in the degree of relationship each point has with each cluster:
(in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.

As part of the implementation I would like:
- create a base class for K- Means and FCM
- implement the relationship for each algorithm differently (in its class)



I'd like this to be assigned to me.

  was:
I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.

FCM is very similar to K - Means which is already implemented, and they differ 
only in the degree of relationship each point has with each cluster:
(in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.

As part of the implementation I would like:
- create a base class for K- Means and FCM
- implement the relationship for each algorithm differently (in its class)

 Priority: Minor  (was: Major)
Affects Version/s: (was: 1.0.0)

 Add Fuzzy C-Means algorithm to MLlib
 

 Key: SPARK-2344
 URL: https://issues.apache.org/jira/browse/SPARK-2344
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alex
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib.
 FCM is very similar to K - Means which is already implemented, and they 
 differ only in the degree of relationship each point has with each cluster:
 (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1.
 As part of the implementation I would like:
 - create a base class for K- Means and FCM
 - implement the relationship for each algorithm differently (in its class)
 I'd like this to be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)