[jira] [Created] (SPARK-3228) When DStream save RDD to hdfs , don't create directory and empty file if there are no data received from source in the batch duration .
Leo created SPARK-3228: -- Summary: When DStream save RDD to hdfs , don't create directory and empty file if there are no data received from source in the batch duration . Key: SPARK-3228 URL: https://issues.apache.org/jira/browse/SPARK-3228 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Leo When I use DStream to save files to hdfs, it will create a directory and a empty file named "_SUCCESS" for each job which made in the batch duration. But if there are no data from source for a long time , and the duration is very short(e.g. 10s), it will create so many directory and empty files in hdfs. I don't think it is necessary. So I want to modify class DStream's method saveAsObjectFiles and saveAsTextFiles , it creates directory and files just when the RDD's partitions size > 0 . -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-2886) Use more specific actor system name than "spark"
[ https://issues.apache.org/jira/browse/SPARK-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-2886: Assignee: Andrew Or > Use more specific actor system name than "spark" > > > Key: SPARK-2886 > URL: https://issues.apache.org/jira/browse/SPARK-2886 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Minor > Fix For: 1.1.0 > > > With a recent PR (https://github.com/apache/spark/pull/1777) we log the name > of the actor system when it binds to a port. We should use a more specific > name instead of "spark." -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2886) Use more specific actor system name than "spark"
[ https://issues.apache.org/jira/browse/SPARK-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-2886. -- Resolution: Fixed Fixed by https://github.com/apache/spark/pull/1810 > Use more specific actor system name than "spark" > > > Key: SPARK-2886 > URL: https://issues.apache.org/jira/browse/SPARK-2886 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: Andrew Or >Priority: Minor > Fix For: 1.1.0 > > > With a recent PR (https://github.com/apache/spark/pull/1777) we log the name > of the actor system when it binds to a port. We should use a more specific > name instead of "spark." -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3167) Port recent spark-submit changes to windows
[ https://issues.apache.org/jira/browse/SPARK-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110382#comment-14110382 ] Apache Spark commented on SPARK-3167: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/2129 > Port recent spark-submit changes to windows > --- > > Key: SPARK-3167 > URL: https://issues.apache.org/jira/browse/SPARK-3167 > Project: Spark > Issue Type: Bug >Reporter: Patrick Wendell >Assignee: Andrew Or >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3145) Hive on Spark umbrella
[ https://issues.apache.org/jira/browse/SPARK-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110362#comment-14110362 ] Patrick Wendell commented on SPARK-3145: [~bcwalrus] hey BC I made a minor change to the title since this concerns broader issues than dependencies. Hope that's alright! > Hive on Spark umbrella > -- > > Key: SPARK-3145 > URL: https://issues.apache.org/jira/browse/SPARK-3145 > Project: Spark > Issue Type: Epic > Components: Build, Shuffle, Spark Core >Reporter: bc Wong > > This is an umbrella jira to point to dependency & asks from the Hive-on-Spark > project (HIVE-7292). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110363#comment-14110363 ] Joseph K. Bradley commented on SPARK-3213: -- Vida, that sounds fine; I'll show you how I did it tomorrow. (I think it was not a temporary thing since I have not seen the spot instances get tags like that before.) Patrick, good to know! I'll use the script from now on. > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Improvement > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > Attachments: Screen Shot 2014-08-25 at 6.45.35 PM.png > > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3145) Hive on Spark umbrella
[ https://issues.apache.org/jira/browse/SPARK-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3145: --- Summary: Hive on Spark umbrella (was: Hive on Spark dependency umbrella) > Hive on Spark umbrella > -- > > Key: SPARK-3145 > URL: https://issues.apache.org/jira/browse/SPARK-3145 > Project: Spark > Issue Type: Epic > Components: Build, Shuffle, Spark Core >Reporter: bc Wong > > This is an umbrella jira to point to dependency & asks from the Hive-on-Spark > project (HIVE-7292). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110359#comment-14110359 ] Yu Ishikawa commented on SPARK-2344: HI Alex, Noted with tnaks! I am very interested in design for standarized clustering algorithm API. I'm trying to implement an approximate hierarchical clustering algorithm now too. Standarized API helps me to implement that. I look forward to seeing this included in MLlib. https://issues.apache.org/jira/browse/SPARK-2966 If you have a branch for implementign FCM on github, would you please let me know? > Add Fuzzy C-Means algorithm to MLlib > > > Key: SPARK-2344 > URL: https://issues.apache.org/jira/browse/SPARK-2344 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Alex >Priority: Minor > Original Estimate: 1m > Remaining Estimate: 1m > > I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. > FCM is very similar to K - Means which is already implemented, and they > differ only in the degree of relationship each point has with each cluster: > (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. > As part of the implementation I would like: > - create a base class for K- Means and FCM > - implement the relationship for each algorithm differently (in its class) > I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3178) setting SPARK_WORKER_MEMORY to a value without a label (m or g) sets the worker memory limit to zero
[ https://issues.apache.org/jira/browse/SPARK-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3178: --- Labels: starter (was: ) > setting SPARK_WORKER_MEMORY to a value without a label (m or g) sets the > worker memory limit to zero > > > Key: SPARK-3178 > URL: https://issues.apache.org/jira/browse/SPARK-3178 > Project: Spark > Issue Type: Bug > Environment: osx >Reporter: Jon Haddad > Labels: starter > > This should either default to m or just completely fail. Starting a worker > with zero memory isn't very helpful. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3213: --- Issue Type: Improvement (was: Bug) > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Improvement > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > Attachments: Screen Shot 2014-08-25 at 6.45.35 PM.png > > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110353#comment-14110353 ] Patrick Wendell commented on SPARK-3213: Hey I don't think we previously supported adding slaves like this, so I'm renaming this from a bug to a feature :) > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Improvement > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > Attachments: Screen Shot 2014-08-25 at 6.45.35 PM.png > > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3223: --- Priority: Critical (was: Major) > runAsSparkUser cannot change HDFS write permission properly in mesos cluster > mode > - > > Key: SPARK-3223 > URL: https://issues.apache.org/jira/browse/SPARK-3223 > Project: Spark > Issue Type: Bug > Components: Input/Output, Mesos >Affects Versions: 1.0.2 >Reporter: Jongyoul Lee >Priority: Critical > Fix For: 1.0.3 > > > While running mesos with --no-switch_user option, HDFS account name is > different from driver and executor. It makes a permission error at last > stage. Executor's id is mesos' user id and driver's id is who runs > spark-submit. So, moving output from _temporary/path/to/output/part- to > /output/path/part- fails because of permission error. The solution for > this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend > calls runAsSparkUser. HADOOP_USER_NAME is used when FileSystem get user. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3224) FetchFailed stages could show up multiple times in failed stages in web ui
[ https://issues.apache.org/jira/browse/SPARK-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3224: --- Priority: Blocker (was: Critical) > FetchFailed stages could show up multiple times in failed stages in web ui > -- > > Key: SPARK-3224 > URL: https://issues.apache.org/jira/browse/SPARK-3224 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Blocker > > Today I saw a job in which a reduce stage failed and showed up a lot of times > in the failed stages. I think the reason is that the DAGScheduler stage > complete (with failure) event multiple times in the case of FetchFailed. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110342#comment-14110342 ] Alex commented on SPARK-2344: - Hi, I'm currently working on the implementation of FCM myself. Also see this: https://issues.apache.org/jira/browse/SPARK-2430 (JIRA for Standarized Clustering Algorithm API) > Add Fuzzy C-Means algorithm to MLlib > > > Key: SPARK-2344 > URL: https://issues.apache.org/jira/browse/SPARK-2344 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Alex >Priority: Minor > Original Estimate: 1m > Remaining Estimate: 1m > > I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. > FCM is very similar to K - Means which is already implemented, and they > differ only in the degree of relationship each point has with each cluster: > (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. > As part of the implementation I would like: > - create a base class for K- Means and FCM > - implement the relationship for each algorithm differently (in its class) > I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3226) Doc update for MLlib dependencies
[ https://issues.apache.org/jira/browse/SPARK-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110335#comment-14110335 ] Apache Spark commented on SPARK-3226: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/2128 > Doc update for MLlib dependencies > - > > Key: SPARK-3226 > URL: https://issues.apache.org/jira/browse/SPARK-3226 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > to mention `-Pnetlib-lgpl` option. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3227) Add MLlib migration guide (1.0 -> 1.1)
Xiangrui Meng created SPARK-3227: Summary: Add MLlib migration guide (1.0 -> 1.1) Key: SPARK-3227 URL: https://issues.apache.org/jira/browse/SPARK-3227 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Reporter: Xiangrui Meng Assignee: Joseph K. Bradley Most API changes happen in decision tree. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3226) Doc update for MLlib dependencies
Xiangrui Meng created SPARK-3226: Summary: Doc update for MLlib dependencies Key: SPARK-3226 URL: https://issues.apache.org/jira/browse/SPARK-3226 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Reporter: Xiangrui Meng Assignee: Xiangrui Meng to mention `-Pnetlib-lgpl` option. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2839) Documentation for statistical functions
[ https://issues.apache.org/jira/browse/SPARK-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2839: - Assignee: Burak Yavuz (was: Xiangrui Meng) > Documentation for statistical functions > --- > > Key: SPARK-2839 > URL: https://issues.apache.org/jira/browse/SPARK-2839 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Reporter: Xiangrui Meng >Assignee: Burak Yavuz > > Add documentation and code examples for statistical functions to MLlib's > programming guide. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-3223: Description: While running mesos with --no-switch_user option, HDFS account name is different from driver and executor. It makes a permission error at last stage. Executor's id is mesos' user id and driver's id is who runs spark-submit. So, moving output from _temporary/path/to/output/part- to /output/path/part- fails because of permission error. The solution for this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend calls runAsSparkUser. HADOOP_USER_NAME is used when FileSystem get user. (was: While running mesos with --no-switch_user option, HDFS account name is different from driver and executor. It makes a permission error at last stage. Executor's id is mesos' user id and driver's id is who runs spark-submit. So, moving output from _temporary/path/to/output/part- to /output/path/part- fails because of permission error. The solution for this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend calls runAsSparkUser.) > runAsSparkUser cannot change HDFS write permission properly in mesos cluster > mode > - > > Key: SPARK-3223 > URL: https://issues.apache.org/jira/browse/SPARK-3223 > Project: Spark > Issue Type: Bug > Components: Input/Output, Mesos >Affects Versions: 1.0.2 >Reporter: Jongyoul Lee > Fix For: 1.0.3 > > > While running mesos with --no-switch_user option, HDFS account name is > different from driver and executor. It makes a permission error at last > stage. Executor's id is mesos' user id and driver's id is who runs > spark-submit. So, moving output from _temporary/path/to/output/part- to > /output/path/part- fails because of permission error. The solution for > this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend > calls runAsSparkUser. HADOOP_USER_NAME is used when FileSystem get user. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-3223: Target Version/s: 1.0.3 (was: 1.1.0) > runAsSparkUser cannot change HDFS write permission properly in mesos cluster > mode > - > > Key: SPARK-3223 > URL: https://issues.apache.org/jira/browse/SPARK-3223 > Project: Spark > Issue Type: Bug > Components: Input/Output, Mesos >Affects Versions: 1.0.2 >Reporter: Jongyoul Lee > Fix For: 1.0.3 > > > While running mesos with --no-switch_user option, HDFS account name is > different from driver and executor. It makes a permission error at last > stage. Executor's id is mesos' user id and driver's id is who runs > spark-submit. So, moving output from _temporary/path/to/output/part- to > /output/path/part- fails because of permission error. The solution for > this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend > calls runAsSparkUser. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-3223: Fix Version/s: 1.0.3 > runAsSparkUser cannot change HDFS write permission properly in mesos cluster > mode > - > > Key: SPARK-3223 > URL: https://issues.apache.org/jira/browse/SPARK-3223 > Project: Spark > Issue Type: Bug > Components: Input/Output, Mesos >Affects Versions: 1.0.2 >Reporter: Jongyoul Lee > Fix For: 1.0.3 > > > While running mesos with --no-switch_user option, HDFS account name is > different from driver and executor. It makes a permission error at last > stage. Executor's id is mesos' user id and driver's id is who runs > spark-submit. So, moving output from _temporary/path/to/output/part- to > /output/path/part- fails because of permission error. The solution for > this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend > calls runAsSparkUser. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3224) FetchFailed stages could show up multiple times in failed stages in web ui
[ https://issues.apache.org/jira/browse/SPARK-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-3224: --- Priority: Critical (was: Major) > FetchFailed stages could show up multiple times in failed stages in web ui > -- > > Key: SPARK-3224 > URL: https://issues.apache.org/jira/browse/SPARK-3224 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Critical > > Today I saw a job in which a reduce stage failed and showed up a lot of times > in the failed stages. I think the reason is that the DAGScheduler stage > complete (with failure) event multiple times in the case of FetchFailed. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3225) Typo in script
WangTaoTheTonic created SPARK-3225: -- Summary: Typo in script Key: SPARK-3225 URL: https://issues.apache.org/jira/browse/SPARK-3225 Project: Spark Issue Type: Bug Components: Deploy Reporter: WangTaoTheTonic Priority: Minor use_conf_dir => user_conf_dir in load-spark-env.sh. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3224) FetchFailed stages could show up multiple times in failed stages in web ui
[ https://issues.apache.org/jira/browse/SPARK-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110315#comment-14110315 ] Apache Spark commented on SPARK-3224: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/2127 > FetchFailed stages could show up multiple times in failed stages in web ui > -- > > Key: SPARK-3224 > URL: https://issues.apache.org/jira/browse/SPARK-3224 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Reynold Xin >Assignee: Reynold Xin > > Today I saw a job in which a reduce stage failed and showed up a lot of times > in the failed stages. I think the reason is that the DAGScheduler stage > complete (with failure) event multiple times in the case of FetchFailed. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110314#comment-14110314 ] Apache Spark commented on SPARK-3223: - User 'jongyoul' has created a pull request for this issue: https://github.com/apache/spark/pull/2126 > runAsSparkUser cannot change HDFS write permission properly in mesos cluster > mode > - > > Key: SPARK-3223 > URL: https://issues.apache.org/jira/browse/SPARK-3223 > Project: Spark > Issue Type: Bug > Components: Input/Output, Mesos >Affects Versions: 1.0.2 >Reporter: Jongyoul Lee > > While running mesos with --no-switch_user option, HDFS account name is > different from driver and executor. It makes a permission error at last > stage. Executor's id is mesos' user id and driver's id is who runs > spark-submit. So, moving output from _temporary/path/to/output/part- to > /output/path/part- fails because of permission error. The solution for > this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend > calls runAsSparkUser. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3223: --- Priority: Major (was: Blocker) > runAsSparkUser cannot change HDFS write permission properly in mesos cluster > mode > - > > Key: SPARK-3223 > URL: https://issues.apache.org/jira/browse/SPARK-3223 > Project: Spark > Issue Type: Bug > Components: Input/Output, Mesos >Affects Versions: 1.0.2 >Reporter: Jongyoul Lee > > While running mesos with --no-switch_user option, HDFS account name is > different from driver and executor. It makes a permission error at last > stage. Executor's id is mesos' user id and driver's id is who runs > spark-submit. So, moving output from _temporary/path/to/output/part- to > /output/path/part- fails because of permission error. The solution for > this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend > calls runAsSparkUser. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3223: --- Fix Version/s: (was: 1.1.0) > runAsSparkUser cannot change HDFS write permission properly in mesos cluster > mode > - > > Key: SPARK-3223 > URL: https://issues.apache.org/jira/browse/SPARK-3223 > Project: Spark > Issue Type: Bug > Components: Input/Output, Mesos >Affects Versions: 1.0.2 >Reporter: Jongyoul Lee >Priority: Blocker > > While running mesos with --no-switch_user option, HDFS account name is > different from driver and executor. It makes a permission error at last > stage. Executor's id is mesos' user id and driver's id is who runs > spark-submit. So, moving output from _temporary/path/to/output/part- to > /output/path/part- fails because of permission error. The solution for > this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend > calls runAsSparkUser. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3224) FetchFailed stages could show up multiple times in failed stages in web ui
Reynold Xin created SPARK-3224: -- Summary: FetchFailed stages could show up multiple times in failed stages in web ui Key: SPARK-3224 URL: https://issues.apache.org/jira/browse/SPARK-3224 Project: Spark Issue Type: Bug Components: Web UI Reporter: Reynold Xin Assignee: Reynold Xin Today I saw a job in which a reduce stage failed and showed up a lot of times in the failed stages. I think the reason is that the DAGScheduler stage complete (with failure) event multiple times in the case of FetchFailed. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3223) runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode
Jongyoul Lee created SPARK-3223: --- Summary: runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode Key: SPARK-3223 URL: https://issues.apache.org/jira/browse/SPARK-3223 Project: Spark Issue Type: Bug Components: Input/Output, Mesos Affects Versions: 1.0.2 Reporter: Jongyoul Lee Priority: Blocker Fix For: 1.1.0 While running mesos with --no-switch_user option, HDFS account name is different from driver and executor. It makes a permission error at last stage. Executor's id is mesos' user id and driver's id is who runs spark-submit. So, moving output from _temporary/path/to/output/part- to /output/path/part- fails because of permission error. The solution for this is only setting SPARK_USER to HADOOP_USER_NAME when MesosExecutorBackend calls runAsSparkUser. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3222) cross join support in HiveQl
[ https://issues.apache.org/jira/browse/SPARK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated SPARK-3222: --- Component/s: SQL > cross join support in HiveQl > > > Key: SPARK-3222 > URL: https://issues.apache.org/jira/browse/SPARK-3222 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Adrian Wang > > Spark SQL hiveQl should support cross join. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110268#comment-14110268 ] Yu Ishikawa edited comment on SPARK-2344 at 8/26/14 5:15 AM: - HI Alex, It seems that fuzzy c-means algorithm has not been merged into Spark yet. I am implementing that algorithm and create a base class for k-means and FCM. Would you please assign this issue to me. was (Author: yuu.ishik...@gmail.com): HI Alex, It seems that fuzzy c-means algorithm has been merged into Spark yet. I am implementing that algorithm and create a base class for k-means and FCM. Would you please assign this issue to me. > Add Fuzzy C-Means algorithm to MLlib > > > Key: SPARK-2344 > URL: https://issues.apache.org/jira/browse/SPARK-2344 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Alex >Priority: Minor > Original Estimate: 1m > Remaining Estimate: 1m > > I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. > FCM is very similar to K - Means which is already implemented, and they > differ only in the degree of relationship each point has with each cluster: > (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. > As part of the implementation I would like: > - create a base class for K- Means and FCM > - implement the relationship for each algorithm differently (in its class) > I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110268#comment-14110268 ] Yu Ishikawa commented on SPARK-2344: HI Alex, It seems that fuzzy c-means algorithm has been merged into Spark yet. I am implementing that algorithm and create a base class for k-means and FCM. Would you please assign this issue to me. > Add Fuzzy C-Means algorithm to MLlib > > > Key: SPARK-2344 > URL: https://issues.apache.org/jira/browse/SPARK-2344 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Alex >Priority: Minor > Original Estimate: 1m > Remaining Estimate: 1m > > I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. > FCM is very similar to K - Means which is already implemented, and they > differ only in the degree of relationship each point has with each cluster: > (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. > As part of the implementation I would like: > - create a base class for K- Means and FCM > - implement the relationship for each algorithm differently (in its class) > I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2541) Standalone mode can't access secure HDFS anymore
[ https://issues.apache.org/jira/browse/SPARK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110250#comment-14110250 ] qingtang commented on SPARK-2541: - Hi, Thomas, could you share how do you access secure HDFS from standalone deployment of spark? > Standalone mode can't access secure HDFS anymore > > > Key: SPARK-2541 > URL: https://issues.apache.org/jira/browse/SPARK-2541 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0, 1.0.1 >Reporter: Thomas Graves > > In spark 0.9.x you could access secure HDFS from Standalone deploy, that > doesn't work in 1.X anymore. > It looks like the issues is in SparkHadoopUtil.runAsSparkUser. Previously it > wouldn't do the doAs if the currentUser == user. Not sure how it affects > when the daemons run as a super user but SPARK_USER is set to someone else. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3155) Support DecisionTree pruning
[ https://issues.apache.org/jira/browse/SPARK-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110241#comment-14110241 ] Joseph K. Bradley commented on SPARK-3155: -- Hi Qiping, thanks very much for the offer! It would be great to get your help. [~mengxr] Could you please assign this? Coordination: I just submitted a PR for DecisionTree [https://github.com/apache/spark/pull/2125] which does some major changes. After that PR, I hope to work on other parts of MLlib. However, [~manishamde] plans to work on generalizing DecisionTree to include random forests, so you may want to coordinate with him. More thoughts on pruning: In my mind, pruning is related to this JIRA or [https://issues.apache.org/jira/browse/SPARK-3161], which would change the example--node mapping for the training data. I figure the example--node mapping should be treated the same way for the training and pruning/validation sets. > Support DecisionTree pruning > > > Key: SPARK-3155 > URL: https://issues.apache.org/jira/browse/SPARK-3155 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley > > Improvement: accuracy, computation > Summary: Pruning is a common method for preventing overfitting with decision > trees. A smart implementation can prune the tree during training in order to > avoid training parts of the tree which would be pruned eventually anyways. > DecisionTree does not currently support pruning. > Pruning: A “pruning” of a tree is a subtree with the same root node, but > with zero or more branches removed. > A naive implementation prunes as follows: > (1) Train a depth K tree using a training set. > (2) Compute the optimal prediction at each node (including internal nodes) > based on the training set. > (3) Take a held-out validation set, and use the tree to make predictions for > each validation example. This allows one to compute the validation error > made at each node in the tree (based on the predictions computed in step (2).) > (4) For each pair of leafs with the same parent, compare the total error on > the validation set made by the leafs’ predictions with the error made by the > parent’s predictions. Remove the leafs if the parent has lower error. > A smarter implementation prunes during training, computing the error on the > validation set made by each node as it is trained. Whenever two children > increase the validation error, they are pruned, and no more training is > required on that branch. > It is common to use about 1/3 of the data for pruning. Note that pruning is > important when using a tree directly for prediction. It is less important > when combining trees via ensemble methods. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3086) Use 1-indexing for decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110232#comment-14110232 ] Apache Spark commented on SPARK-3086: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/2125 > Use 1-indexing for decision tree nodes > -- > > Key: SPARK-3086 > URL: https://issues.apache.org/jira/browse/SPARK-3086 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Xiangrui Meng >Assignee: Joseph K. Bradley >Priority: Minor > > 1-indexing is good for binary trees. The root node gets index 1. And for any > node with index i, its left child is (i << 1), right child is (i << 1) + 1, > parent is (i >> 1), and its level is `java.lang.Integer.highestOneBit(idx)` > (also 1-indexing for levels). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3156) DecisionTree: Order categorical features adaptively
[ https://issues.apache.org/jira/browse/SPARK-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110234#comment-14110234 ] Apache Spark commented on SPARK-3156: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/2125 > DecisionTree: Order categorical features adaptively > --- > > Key: SPARK-3156 > URL: https://issues.apache.org/jira/browse/SPARK-3156 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > Improvement: accuracy > Currently, ordered categorical features use a fixed bin ordering chosen > before training based on a subsample of the data. (See the code using > centroids in findSplitsBins().) > Proposal: Choose the ordering adaptively for every split. This would require > a bit more computation on the master, but could improve results by splitting > more intelligently. > Required changes: The result of aggregation is used in > findAggForOrderedFeatureClassification() to compute running totals over the > pre-set ordering of categorical feature values. The stats should instead be > used to choose a new ordering of categories, before computing running totals. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3043) DecisionTree aggregation is inefficient
[ https://issues.apache.org/jira/browse/SPARK-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110233#comment-14110233 ] Apache Spark commented on SPARK-3043: - User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/2125 > DecisionTree aggregation is inefficient > --- > > Key: SPARK-3043 > URL: https://issues.apache.org/jira/browse/SPARK-3043 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > 2 major efficiency issues in computation and storage: > (1) DecisionTree aggregation involves reshaping data unnecessarily. > E.g., the internal methods extractNodeInfo() and getBinDataForNode() involve > reshaping the data multiple times without real computation. > (2) DecisionTree splits and aggregate bins can include many unused > bins/splits. > The same number of splits/bins are used for all features. E.g., if there is > a continuous feature which uses 100 bins, then there will also be 100 bins > allocated for all binary features, even though only 2 are necessary. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3098) In some cases, operation zipWithIndex get a wrong results
[ https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110229#comment-14110229 ] Guoqiang Li commented on SPARK-3098: Now the bug is this: After the shuffle fetches, Multiple calls the {{zip}}, {{zipWithIndex}}, {{zipWithUniqueId}} operation returns the result is inconsistent. [The PR 2083|https://github.com/apache/spark/pull/2083] will affect performance . I am testing the impact of specific performance. Another solution is to re-implement the above operation > In some cases, operation zipWithIndex get a wrong results > -- > > Key: SPARK-3098 > URL: https://issues.apache.org/jira/browse/SPARK-3098 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.1 >Reporter: Guoqiang Li >Priority: Critical > > The reproduce code: > {code} > val c = sc.parallelize(1 to 7899).flatMap { i => > (1 to 1).toSeq.map(p => i * 6000 + p) > }.distinct().zipWithIndex() > c.join(c).filter(t => t._2._1 != t._2._2).take(3) > {code} > => > {code} > Array[(Int, (Long, Long))] = Array((1732608,(11,12)), (45515264,(12,13)), > (36579712,(13,14))) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3222) cross join support in HiveQl
Adrian Wang created SPARK-3222: -- Summary: cross join support in HiveQl Key: SPARK-3222 URL: https://issues.apache.org/jira/browse/SPARK-3222 Project: Spark Issue Type: New Feature Reporter: Adrian Wang Spark SQL hiveQl should support cross join. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2976) Replace tabs with spaces
[ https://issues.apache.org/jira/browse/SPARK-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2976. -- Resolution: Fixed Fix Version/s: 1.2.0 > Replace tabs with spaces > > > Key: SPARK-2976 > URL: https://issues.apache.org/jira/browse/SPARK-2976 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 1.2.0 > > > Currently, there are too many tabs in source file, which does not correspond > to coding style. > I saw following 3 files have tabs. > * sorttable.js > * JavaPageRank.java > * JavaKinesisWordCountASL.java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3155) Support DecisionTree pruning
[ https://issues.apache.org/jira/browse/SPARK-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110194#comment-14110194 ] Qiping Li commented on SPARK-3155: -- Hi Joseph, glad to see you have considered to support pruning in MLLib's decision tree, is there someone working on this issue, or you can assign this issue to me. I'm ready to help on this module. > Support DecisionTree pruning > > > Key: SPARK-3155 > URL: https://issues.apache.org/jira/browse/SPARK-3155 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley > > Improvement: accuracy, computation > Summary: Pruning is a common method for preventing overfitting with decision > trees. A smart implementation can prune the tree during training in order to > avoid training parts of the tree which would be pruned eventually anyways. > DecisionTree does not currently support pruning. > Pruning: A “pruning” of a tree is a subtree with the same root node, but > with zero or more branches removed. > A naive implementation prunes as follows: > (1) Train a depth K tree using a training set. > (2) Compute the optimal prediction at each node (including internal nodes) > based on the training set. > (3) Take a held-out validation set, and use the tree to make predictions for > each validation example. This allows one to compute the validation error > made at each node in the tree (based on the predictions computed in step (2).) > (4) For each pair of leafs with the same parent, compare the total error on > the validation set made by the leafs’ predictions with the error made by the > parent’s predictions. Remove the leafs if the parent has lower error. > A smarter implementation prunes during training, computing the error on the > validation set made by each node as it is trained. Whenever two children > increase the validation error, they are pruned, and no more training is > required on that branch. > It is common to use about 1/3 of the data for pruning. Note that pruning is > important when using a tree directly for prediction. It is less important > when combining trees via ensemble methods. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2976) Replace tabs with spaces
[ https://issues.apache.org/jira/browse/SPARK-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2976: - Summary: Replace tabs with spaces (was: Too many ugly tabs instead of white spaces) > Replace tabs with spaces > > > Key: SPARK-2976 > URL: https://issues.apache.org/jira/browse/SPARK-2976 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.1.0 >Reporter: Kousuke Saruta >Priority: Minor > > Currently, there are too many tabs in source file, which does not correspond > to coding style. > I saw following 3 files have tabs. > * sorttable.js > * JavaPageRank.java > * JavaKinesisWordCountASL.java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2976) Replace tabs with spaces
[ https://issues.apache.org/jira/browse/SPARK-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2976: - Assignee: Kousuke Saruta > Replace tabs with spaces > > > Key: SPARK-2976 > URL: https://issues.apache.org/jira/browse/SPARK-2976 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Currently, there are too many tabs in source file, which does not correspond > to coding style. > I saw following 3 files have tabs. > * sorttable.js > * JavaPageRank.java > * JavaKinesisWordCountASL.java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2481) The environment variables SPARK_HISTORY_OPTS is covered in start-history-server.sh
[ https://issues.apache.org/jira/browse/SPARK-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110187#comment-14110187 ] Andrew Or commented on SPARK-2481: -- Resolved by https://github.com/apache/spark/pull/1341 > The environment variables SPARK_HISTORY_OPTS is covered in > start-history-server.sh > -- > > Key: SPARK-2481 > URL: https://issues.apache.org/jira/browse/SPARK-2481 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.0.1 >Reporter: Guoqiang Li >Assignee: Guoqiang Li > Fix For: 1.1.0 > > > If we have the following code in the conf/spark-env.sh > {{export SPARK_HISTORY_OPTS="-DSpark.history.XX=XX"}} > The environment variables SPARK_HISTORY_OPTS is covered in > [start-history-server.sh|https://github.com/apache/spark/blob/master/sbin/start-history-server.sh] > > {code} > if [ $# != 0 ]; then > echo "Using command line arguments for setting the log directory is > deprecated. Please " > echo "set the spark.history.fs.logDirectory configuration option instead." > export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS > -Dspark.history.fs.logDirectory=$1" > fi > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2481) The environment variables SPARK_HISTORY_OPTS is covered in start-history-server.sh
[ https://issues.apache.org/jira/browse/SPARK-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-2481. -- Resolution: Fixed Fix Version/s: 1.1.0 Target Version/s: 1.1.0 > The environment variables SPARK_HISTORY_OPTS is covered in > start-history-server.sh > -- > > Key: SPARK-2481 > URL: https://issues.apache.org/jira/browse/SPARK-2481 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0, 1.0.1 >Reporter: Guoqiang Li >Assignee: Guoqiang Li > Fix For: 1.1.0 > > > If we have the following code in the conf/spark-env.sh > {{export SPARK_HISTORY_OPTS="-DSpark.history.XX=XX"}} > The environment variables SPARK_HISTORY_OPTS is covered in > [start-history-server.sh|https://github.com/apache/spark/blob/master/sbin/start-history-server.sh] > > {code} > if [ $# != 0 ]; then > echo "Using command line arguments for setting the log directory is > deprecated. Please " > echo "set the spark.history.fs.logDirectory configuration option instead." > export SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS > -Dspark.history.fs.logDirectory=$1" > fi > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3098) In some cases, operation zipWithIndex get a wrong results
[ https://issues.apache.org/jira/browse/SPARK-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110183#comment-14110183 ] Matei Zaharia commented on SPARK-3098: -- Sorry, I don't understand -- what exactly is the bug here? There's no guarantee about the ordering of elements in distinct(). If you're relying on zipWithIndex creating specific values, that's a wrong assumption to make. The question is just whether the *set* of elements returned by zipWithIndex is correct. I don't think we should change our randomize() to be more deterministic here just because you want zipWithIndex. We have to allow shuffle fetches to occur in a random order, or else we can get inefficiency when there are hotspots. If you'd like to make sure values land in specific partitions and in a specific order in each partition, you can partition the data with your own Partitioner, and run a mapPartitions that sorts them within each one. > In some cases, operation zipWithIndex get a wrong results > -- > > Key: SPARK-3098 > URL: https://issues.apache.org/jira/browse/SPARK-3098 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.1 >Reporter: Guoqiang Li >Priority: Critical > > The reproduce code: > {code} > val c = sc.parallelize(1 to 7899).flatMap { i => > (1 to 1).toSeq.map(p => i * 6000 + p) > }.distinct().zipWithIndex() > c.join(c).filter(t => t._2._1 != t._2._2).take(3) > {code} > => > {code} > Array[(Int, (Long, Long))] = Array((1732608,(11,12)), (45515264,(12,13)), > (36579712,(13,14))) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3037) Add ArrayType containing null value support to Parquet.
[ https://issues.apache.org/jira/browse/SPARK-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3037: Assignee: Takuya Ueshin > Add ArrayType containing null value support to Parquet. > --- > > Key: SPARK-3037 > URL: https://issues.apache.org/jira/browse/SPARK-3037 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Blocker > > Parquet support should handle {{ArrayType}} when {{containsNull}} is {{true}}. > When {{containsNull}} is {{true}}, the schema should be as follows: > {noformat} > message root { > optional group a (LIST) { > repeated group bag { > optional int32 array_element; > } > } > } > {noformat} > FYI: > Hive's Parquet writer *always* uses this schema, and reader can read only > from this schema, i.e. current Parquet support of SparkSQL is not compatible > with Hive. > NOTICE: > If Hive compatiblity is top priority, we also have to use this schma > regardless of {{containsNull}}, which will break backward compatibility. > But using this schema could affect performance. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3036) Add MapType containing null value support to Parquet.
[ https://issues.apache.org/jira/browse/SPARK-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3036: Assignee: Takuya Ueshin > Add MapType containing null value support to Parquet. > - > > Key: SPARK-3036 > URL: https://issues.apache.org/jira/browse/SPARK-3036 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Blocker > > Current Parquet schema for {{MapType}} is as follows regardless of > {{valueContainsNull}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > required int32 value; > } > } > } > {noformat} > and if the map contains {{null}} value, it throws runtime exception. > To handle {{MapType}} containing {{null}} value, the schema should be as > follows if {{valueContainsNull}} is {{true}}: > {noformat} > message root { > optional group a (MAP) { > repeated group map (MAP_KEY_VALUE) { > required int32 key; > optional int32 value; > } > } > } > {noformat} > FYI: > Hive's Parquet writer *always* uses the latter schema, but reader can read > from both schema. > NOTICE: > This change will break backward compatibility when the schema is read from > Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API
[ https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110172#comment-14110172 ] Rui Li commented on SPARK-2636: --- Just want to make sure I understand everything correctly: I think user submits a job via an RDD action, which in turn calls {{SparkContex.runJob -> DAGScheduler.runJob -> DAGScheduler.submitJob -> DAGScheduler.handleJobSubmitted}}. The requirement is we should return some job ID to the user. So I think putting that in a DAGScheduler method doesn't help? BTW, {{DAGScheduler.submitJob}} returns a {{JobWaiter}} which contains the job ID. Also, by "job ID", do we mean {{org.apache.spark.streaming.scheduler.Job.id}} or {{org.apache.spark.scheduler.ActiveJob.jobId}}? Please let me know if I misunderstand anything. > no where to get job identifier while submit spark job through spark API > --- > > Key: SPARK-2636 > URL: https://issues.apache.org/jira/browse/SPARK-2636 > Project: Spark > Issue Type: New Feature > Components: Java API >Reporter: Chengxiang Li > Labels: hive > > In Hive on Spark, we want to track spark job status through Spark API, the > basic idea is as following: > # create an hive-specified spark listener and register it to spark listener > bus. > # hive-specified spark listener generate job status by spark listener events. > # hive driver track job status through hive-specified spark listener. > the current problem is that hive driver need job identifier to track > specified job status through spark listener, but there is no spark API to get > job identifier(like job id) while submit spark job. > I think other project whoever try to track job status with spark API would > suffer from this as well. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3221) Support JRuby as a language for using Spark
[ https://issues.apache.org/jira/browse/SPARK-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110163#comment-14110163 ] Rasik Pandey commented on SPARK-3221: - Currently this isn't possible due to closure and object serialization limitations, but since JRuby is a JVM language that has closures this should be possible. Spark would have to be updated to support JRuby serialization/deserialization or marshal/unmarshal or JRuby objects that aren't necessarily backed by class files. For example, the current ClosureCleaner code expects to resolve actual class files yet in JRuby class files don't always exist for objects. > Support JRuby as a language for using Spark > --- > > Key: SPARK-3221 > URL: https://issues.apache.org/jira/browse/SPARK-3221 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: Rasik Pandey > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3221) Support JRuby as a language for using Spark
Rasik Pandey created SPARK-3221: --- Summary: Support JRuby as a language for using Spark Key: SPARK-3221 URL: https://issues.apache.org/jira/browse/SPARK-3221 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.0.2 Reporter: Rasik Pandey -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2839) Documentation for statistical functions
[ https://issues.apache.org/jira/browse/SPARK-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110154#comment-14110154 ] Apache Spark commented on SPARK-2839: - User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/2123 > Documentation for statistical functions > --- > > Key: SPARK-2839 > URL: https://issues.apache.org/jira/browse/SPARK-2839 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Reporter: Xiangrui Meng >Assignee: Burak Yavuz > > Add documentation and code examples for statistical functions to MLlib's > programming guide. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110149#comment-14110149 ] Vida Ha edited comment on SPARK-3213 at 8/26/14 1:51 AM: - Hi Joseph, Can you tell me more about how you launched these, without copying the tags? I used "Launch More Like This", and the name and tags were copied over correctly - see my screenshot above. I'm wondering if maybe when you were using EC2, if perhaps you could have been so unlucky as to have trigger a temporary outage in copying tags... Let's sync up in person tomorrow and figure out if this was a one time problem or happens each time "Launch More Like This" is used or perhaps if we used different ways to launch more slaves. was (Author: vidaha): Hi Joseph, Can you tell me more about how you launched these, without copying the tags? I used "Launch More Like This", and the name and tags were copied over correctly. I'm wondering if maybe when you were using EC2, if perhaps you could have been so unlucky as to have trigger a temporary outage in copying tags... Let's sync up in person tomorrow and figure out if this was a one time problem or happens each time "Launch More Like This" is used. > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > Attachments: Screen Shot 2014-08-25 at 6.45.35 PM.png > > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vida Ha updated SPARK-3213: --- Attachment: Screen Shot 2014-08-25 at 6.45.35 PM.png > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > Attachments: Screen Shot 2014-08-25 at 6.45.35 PM.png > > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110149#comment-14110149 ] Vida Ha commented on SPARK-3213: Hi Joseph, Can you tell me more about how you launched these, without copying the tags? I used "Launch More Like This", and the name and tags were copied over correctly. I'm wondering if maybe when you were using EC2, if perhaps you could have been so unlucky as to have trigger a temporary outage in copying tags... Let's sync up in person tomorrow and figure out if this was a one time problem or happens each time "Launch > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110149#comment-14110149 ] Vida Ha edited comment on SPARK-3213 at 8/26/14 1:49 AM: - Hi Joseph, Can you tell me more about how you launched these, without copying the tags? I used "Launch More Like This", and the name and tags were copied over correctly. I'm wondering if maybe when you were using EC2, if perhaps you could have been so unlucky as to have trigger a temporary outage in copying tags... Let's sync up in person tomorrow and figure out if this was a one time problem or happens each time "Launch More Like This" is used. was (Author: vidaha): Hi Joseph, Can you tell me more about how you launched these, without copying the tags? I used "Launch More Like This", and the name and tags were copied over correctly. I'm wondering if maybe when you were using EC2, if perhaps you could have been so unlucky as to have trigger a temporary outage in copying tags... Let's sync up in person tomorrow and figure out if this was a one time problem or happens each time "Launch > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
[ https://issues.apache.org/jira/browse/SPARK-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110142#comment-14110142 ] Cheng Lian commented on SPARK-3217: --- [~vanzin] Thanks, I did set {{SPARK_PREPEND_CLASSES}}. Will change the title and description of this issue after verifying it. > Shaded Guava jar doesn't play well with Maven build > --- > > Key: SPARK-3217 > URL: https://issues.apache.org/jira/browse/SPARK-3217 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.2.0 >Reporter: Cheng Lian >Priority: Blocker > > PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file > and moved Guava classes to package {{org.spark-project.guava}} when Spark is > built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to > classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. > The result is that, when Spark is built with Maven (or > {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws > {{ClassNotFoundException}}: > {code} > # Build Spark with Maven > $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > ... > # Then spark-shell complains > $ ./bin/spark-shell > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/util/concurrent/ThreadFactoryBuilder > at org.apache.spark.util.Utils$.(Utils.scala:636) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) > at org.apache.spark.repl.Main$.main(Main.scala:30) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.google.common.util.concurrent.ThreadFactoryBuilder > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 13 more > # Check the assembly jar file > $ jar tf > assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | > grep -i ThreadFactoryBuilder > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class > {code} > SBT build is fine since we don't shade Guava with SBT right now (and that's > why Jenkins didn't complain about this). > Possible solutions can be: > # revert PR #1813 for safe, or > # also shade Guava in SBT build and only use {{org.spark-project.guava}} in > Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel
Derrick Burns created SPARK-3220: Summary: K-Means clusterer should perform K-Means initialization in parallel Key: SPARK-3220 URL: https://issues.apache.org/jira/browse/SPARK-3220 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Derrick Burns The LocalKMeans method should be replaced with a parallel implementation. As it stands now, it becomes a bottleneck for large data sets. I have implemented this functionality in my version of the clusterer. However, I see that there are hundreds of outstanding pull requests. If someone on the team wants to sponsor the pull request, I will create one. Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2921) Mesos doesn't handle spark.executor.extraJavaOptions correctly (among other things)
[ https://issues.apache.org/jira/browse/SPARK-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110127#comment-14110127 ] Cheng Lian commented on SPARK-2921: --- [~andrewor14] {{spark.executor.extraLibraryPath}} is affected. But {{spark.executor.extraClassPath}} should be OK since it's finally added to the environment variable {{SPARK_CLASSPATH}}. > Mesos doesn't handle spark.executor.extraJavaOptions correctly (among other > things) > --- > > Key: SPARK-2921 > URL: https://issues.apache.org/jira/browse/SPARK-2921 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.2 >Reporter: Andrew Or >Priority: Blocker > Fix For: 1.1.0 > > > The code path to handle this exists only for the coarse grained mode, and > even in this mode the java options aren't passed to the executors properly. > We currently pass the entire value of spark.executor.extraJavaOptions to the > executors as a string without splitting it. We need to use > Utils.splitCommandString as in standalone mode. > I have not confirmed this, but I would assume spark.executor.extraClassPath > and spark.executor.extraLibraryPath are also not propagated correctly in > either mode. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3219) K-Means clusterer should support Bregman distance metrics
Derrick Burns created SPARK-3219: Summary: K-Means clusterer should support Bregman distance metrics Key: SPARK-3219 URL: https://issues.apache.org/jira/browse/SPARK-3219 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Derrick Burns The K-Means clusterer supports the Euclidean distance metric. However, it is rather straightforward to support Bregman (http://machinelearning.wustl.edu/mlpapers/paper_files/BanerjeeMDG05.pdf) distance functions which would increase the utility of the clusterer tremendously. I have modified the clusterer to support pluggable distance functions. However, I notice that there are hundreds of outstanding pull requests. If someone is willing to work with me to sponsor the work through the process, I will create a pull request. Otherwise, I will just keep my own fork. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3218) K-Means clusterer can fail on degenerate data
Derrick Burns created SPARK-3218: Summary: K-Means clusterer can fail on degenerate data Key: SPARK-3218 URL: https://issues.apache.org/jira/browse/SPARK-3218 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.2 Reporter: Derrick Burns The KMeans parallel implementation selects points to be cluster centers with probability weighted by their distance to cluster centers. However, if there are fewer than k DISTINCT points in the data set, this approach will fail. Further, the recent checkin to work around this problem results in selection of the same point repeatedly as a cluster center. The fix is to allow fewer than k cluster centers to be selected. This requires several changes to the code, as the number of cluster centers is woven into the implementation. I have a version of the code that addresses this problem, AND generalizes the distance metric. However, I see that there are literally hundreds of outstanding pull requests. If someone will commit to working with me to sponsor the pull request, I will create it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-3193) output errer info when Process exitcode not zero
[ https://issues.apache.org/jira/browse/SPARK-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangfei reopened SPARK-3193: > output errer info when Process exitcode not zero > > > Key: SPARK-3193 > URL: https://issues.apache.org/jira/browse/SPARK-3193 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.2 >Reporter: wangfei > > I noticed that sometimes pr tests failed due to the Process exitcode != 0: > DriverSuite: > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > - driver should exit after finishing *** FAILED *** >SparkException was thrown during property evaluation. > (DriverSuite.scala:40) > Message: Process List(./bin/spark-class, > org.apache.spark.DriverWithoutCleanup, local) exited with code 1 > Occurred at table row 0 (zero based, not counting headings), which had > values ( >master = local > ) > > [info] SparkSubmitSuite: > [info] - prints usage on empty input > [info] - prints usage with only --help > [info] - prints error with unrecognized options > [info] - handle binary specified but not class > [info] - handles arguments with --key=val > [info] - handles arguments to user program > [info] - handles arguments to user program with name collision > [info] - handles YARN cluster mode > [info] - handles YARN client mode > [info] - handles standalone cluster mode > [info] - handles standalone client mode > [info] - handles mesos client mode > [info] - handles confs with flag equivalents > [info] - launch simple application with spark-submit *** FAILED *** > [info] org.apache.spark.SparkException: Process List(./bin/spark-submit, > --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, > --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited > with code 1 > [info] at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872) > [info] at > org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311) > [info] at > org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291) > [info] at > org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284) > [info] at org.apacSpark assembly has been built with Hive, including > Datanucleus jars on classpath > refer to > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull > we should output the process error info when failed, this can be helpful for > diagnosis. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3178) setting SPARK_WORKER_MEMORY to a value without a label (m or g) sets the worker memory limit to zero
[ https://issues.apache.org/jira/browse/SPARK-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110116#comment-14110116 ] Helena Edelson commented on SPARK-3178: --- +1 it doesn't look like the input data is validated to fail fast if mb/g is not noted > setting SPARK_WORKER_MEMORY to a value without a label (m or g) sets the > worker memory limit to zero > > > Key: SPARK-3178 > URL: https://issues.apache.org/jira/browse/SPARK-3178 > Project: Spark > Issue Type: Bug > Environment: osx >Reporter: Jon Haddad > > This should either default to m or just completely fail. Starting a worker > with zero memory isn't very helpful. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
[ https://issues.apache.org/jira/browse/SPARK-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3217: --- Affects Version/s: 1.2.0 > Shaded Guava jar doesn't play well with Maven build > --- > > Key: SPARK-3217 > URL: https://issues.apache.org/jira/browse/SPARK-3217 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.2.0 >Reporter: Cheng Lian >Priority: Blocker > > PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file > and moved Guava classes to package {{org.spark-project.guava}} when Spark is > built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to > classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. > The result is that, when Spark is built with Maven (or > {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws > {{ClassNotFoundException}}: > {code} > # Build Spark with Maven > $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > ... > # Then spark-shell complains > $ ./bin/spark-shell > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/util/concurrent/ThreadFactoryBuilder > at org.apache.spark.util.Utils$.(Utils.scala:636) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) > at org.apache.spark.repl.Main$.main(Main.scala:30) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.google.common.util.concurrent.ThreadFactoryBuilder > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 13 more > # Check the assembly jar file > $ jar tf > assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | > grep -i ThreadFactoryBuilder > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class > {code} > SBT build is fine since we don't shade Guava with SBT right now (and that's > why Jenkins didn't complain about this). > Possible solutions can be: > # revert PR #1813 for safe, or > # also shade Guava in SBT build and only use {{org.spark-project.guava}} in > Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
[ https://issues.apache.org/jira/browse/SPARK-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3217: --- Labels: (was: 1.2.0) > Shaded Guava jar doesn't play well with Maven build > --- > > Key: SPARK-3217 > URL: https://issues.apache.org/jira/browse/SPARK-3217 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.2.0 >Reporter: Cheng Lian >Priority: Blocker > > PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file > and moved Guava classes to package {{org.spark-project.guava}} when Spark is > built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to > classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. > The result is that, when Spark is built with Maven (or > {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws > {{ClassNotFoundException}}: > {code} > # Build Spark with Maven > $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > ... > # Then spark-shell complains > $ ./bin/spark-shell > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/util/concurrent/ThreadFactoryBuilder > at org.apache.spark.util.Utils$.(Utils.scala:636) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) > at org.apache.spark.repl.Main$.main(Main.scala:30) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.google.common.util.concurrent.ThreadFactoryBuilder > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 13 more > # Check the assembly jar file > $ jar tf > assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | > grep -i ThreadFactoryBuilder > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class > {code} > SBT build is fine since we don't shade Guava with SBT right now (and that's > why Jenkins didn't complain about this). > Possible solutions can be: > # revert PR #1813 for safe, or > # also shade Guava in SBT build and only use {{org.spark-project.guava}} in > Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3058) Support EXTENDED for EXPLAIN command
[ https://issues.apache.org/jira/browse/SPARK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3058. - Resolution: Fixed Fix Version/s: 1.1.0 > Support EXTENDED for EXPLAIN command > > > Key: SPARK-3058 > URL: https://issues.apache.org/jira/browse/SPARK-3058 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Cheng Hao >Assignee: Cheng Hao >Priority: Minor > Fix For: 1.1.0 > > > Currently, it's no difference when run the command "EXPLAIN" w or w/o > "EXTENDED" keywords, this patch will show more details of the query plan when > "EXTENDED" keyword provided. > {panel:title=EXPLAIN with EXTENDED} > explain extended select key as a1, value as a2 from src where key=1; > == Parsed Logical Plan == > Project ['key AS a1#3,'value AS a2#4] > Filter ('key = 1) > UnresolvedRelation None, src, None > == Analyzed Logical Plan == > Project [key#8 AS a1#3,value#9 AS a2#4] > Filter (CAST(key#8, DoubleType) = CAST(1, DoubleType)) > MetastoreRelation default, src, None > == Optimized Logical Plan == > Project [key#8 AS a1#3,value#9 AS a2#4] > Filter (CAST(key#8, DoubleType) = 1.0) > MetastoreRelation default, src, None > == Physical Plan == > Project [key#8 AS a1#3,value#9 AS a2#4] > Filter (CAST(key#8, DoubleType) = 1.0) > HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None > Code Generation: false > == RDD == > (2) MappedRDD[14] at map at HiveContext.scala:350 > MapPartitionsRDD[13] at mapPartitions at basicOperators.scala:42 > MapPartitionsRDD[12] at mapPartitions at basicOperators.scala:57 > MapPartitionsRDD[11] at mapPartitions at TableReader.scala:112 > MappedRDD[10] at map at TableReader.scala:240 > HadoopRDD[9] at HadoopRDD at TableReader.scala:230 > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
[ https://issues.apache.org/jira/browse/SPARK-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-3217: -- Target Version/s: 1.2.0 (was: 1.1.0) > Shaded Guava jar doesn't play well with Maven build > --- > > Key: SPARK-3217 > URL: https://issues.apache.org/jira/browse/SPARK-3217 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Cheng Lian >Priority: Blocker > Labels: 1.2.0 > > PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file > and moved Guava classes to package {{org.spark-project.guava}} when Spark is > built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to > classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. > The result is that, when Spark is built with Maven (or > {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws > {{ClassNotFoundException}}: > {code} > # Build Spark with Maven > $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > ... > # Then spark-shell complains > $ ./bin/spark-shell > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/util/concurrent/ThreadFactoryBuilder > at org.apache.spark.util.Utils$.(Utils.scala:636) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) > at org.apache.spark.repl.Main$.main(Main.scala:30) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.google.common.util.concurrent.ThreadFactoryBuilder > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 13 more > # Check the assembly jar file > $ jar tf > assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | > grep -i ThreadFactoryBuilder > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class > {code} > SBT build is fine since we don't shade Guava with SBT right now (and that's > why Jenkins didn't complain about this). > Possible solutions can be: > # revert PR #1813 for safe, or > # also shade Guava in SBT build and only use {{org.spark-project.guava}} in > Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
[ https://issues.apache.org/jira/browse/SPARK-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-3217: -- Labels: 1.2.0 (was: ) > Shaded Guava jar doesn't play well with Maven build > --- > > Key: SPARK-3217 > URL: https://issues.apache.org/jira/browse/SPARK-3217 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Cheng Lian >Priority: Blocker > Labels: 1.2.0 > > PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file > and moved Guava classes to package {{org.spark-project.guava}} when Spark is > built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to > classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. > The result is that, when Spark is built with Maven (or > {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws > {{ClassNotFoundException}}: > {code} > # Build Spark with Maven > $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > ... > # Then spark-shell complains > $ ./bin/spark-shell > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/util/concurrent/ThreadFactoryBuilder > at org.apache.spark.util.Utils$.(Utils.scala:636) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) > at org.apache.spark.repl.Main$.main(Main.scala:30) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.google.common.util.concurrent.ThreadFactoryBuilder > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 13 more > # Check the assembly jar file > $ jar tf > assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | > grep -i ThreadFactoryBuilder > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class > {code} > SBT build is fine since we don't shade Guava with SBT right now (and that's > why Jenkins didn't complain about this). > Possible solutions can be: > # revert PR #1813 for safe, or > # also shade Guava in SBT build and only use {{org.spark-project.guava}} in > Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
[ https://issues.apache.org/jira/browse/SPARK-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-3217: -- Affects Version/s: (was: 1.0.2) > Shaded Guava jar doesn't play well with Maven build > --- > > Key: SPARK-3217 > URL: https://issues.apache.org/jira/browse/SPARK-3217 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Cheng Lian >Priority: Blocker > > PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file > and moved Guava classes to package {{org.spark-project.guava}} when Spark is > built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to > classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. > The result is that, when Spark is built with Maven (or > {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws > {{ClassNotFoundException}}: > {code} > # Build Spark with Maven > $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > ... > # Then spark-shell complains > $ ./bin/spark-shell > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/util/concurrent/ThreadFactoryBuilder > at org.apache.spark.util.Utils$.(Utils.scala:636) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) > at org.apache.spark.repl.Main$.main(Main.scala:30) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.google.common.util.concurrent.ThreadFactoryBuilder > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 13 more > # Check the assembly jar file > $ jar tf > assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | > grep -i ThreadFactoryBuilder > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class > {code} > SBT build is fine since we don't shade Guava with SBT right now (and that's > why Jenkins didn't complain about this). > Possible solutions can be: > # revert PR #1813 for safe, or > # also shade Guava in SBT build and only use {{org.spark-project.guava}} in > Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
[ https://issues.apache.org/jira/browse/SPARK-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110035#comment-14110035 ] Marcelo Vanzin commented on SPARK-3217: --- Just did a "git clean -dfx" on master and rebuilt using maven. This works fine for me. Did you by any chance do one of the following: - forget to "clean" after pulling that change - mix sbt and mvn built artifacts in the same build - set SPARK_PREPEND_CLASSES I can see any of those causing this issue. I think only the last one is something we need to worry about; we now need to figure out a way to add the guava jar to the classpath when using that option. > Shaded Guava jar doesn't play well with Maven build > --- > > Key: SPARK-3217 > URL: https://issues.apache.org/jira/browse/SPARK-3217 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Cheng Lian >Priority: Blocker > > PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file > and moved Guava classes to package {{org.spark-project.guava}} when Spark is > built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to > classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. > The result is that, when Spark is built with Maven (or > {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws > {{ClassNotFoundException}}: > {code} > # Build Spark with Maven > $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests > ... > # Then spark-shell complains > $ ./bin/spark-shell > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/util/concurrent/ThreadFactoryBuilder > at org.apache.spark.util.Utils$.(Utils.scala:636) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) > at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) > at org.apache.spark.repl.Main$.main(Main.scala:30) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.google.common.util.concurrent.ThreadFactoryBuilder > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 13 more > # Check the assembly jar file > $ jar tf > assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | > grep -i ThreadFactoryBuilder > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class > org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class > {code} > SBT build is fine since we don't shade Guava with SBT right now (and that's > why Jenkins didn't complain about this). > Possible solutions can be: > # revert PR #1813 for safe, or > # also shade Guava in SBT build and only use {{org.spark-project.guava}} in > Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.
[ https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110026#comment-14110026 ] Yi Tian commented on SPARK-2087: You mean the "CACHE TABLE ... AS SELECT..." syntax will create temporary table, and could not be found by other session? I'm still confusing about the different between temporary table and cached tables. > Clean Multi-user semantics for thrift JDBC/ODBC server. > --- > > Key: SPARK-2087 > URL: https://issues.apache.org/jira/browse/SPARK-2087 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Michael Armbrust >Assignee: Zongheng Yang >Priority: Minor > > Configuration and temporary tables should exist per-user. Cached tables > should be shared across users. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3061) Maven build fails in Windows OS
[ https://issues.apache.org/jira/browse/SPARK-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3061: -- Affects Version/s: 1.1.0 Maybe we can use a Maven plugin to unzip? http://stackoverflow.com/questions/3264064/unpack-zip-in-zip-with-maven > Maven build fails in Windows OS > --- > > Key: SPARK-3061 > URL: https://issues.apache.org/jira/browse/SPARK-3061 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2, 1.1.0 > Environment: Windows >Reporter: Masayoshi TSUZUKI >Priority: Minor > > Maven build fails in Windows OS with this error message. > {noformat} > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec > (default) on project spark-core_2.10: Command execution failed. Cannot run > program "unzip" (in directory "C:\path\to\gitofspark\python"): CreateProcess > error=2, w肳ꂽt@ -> [Help 1] > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3179) Add task OutputMetrics
[ https://issues.apache.org/jira/browse/SPARK-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109997#comment-14109997 ] Michael Yannakopoulos commented on SPARK-3179: -- Hi Sandy, I am willing to help with this issue. I am a new to Apache Spark and I have made few contributions so far. Under your supervision I can work on this issue. Thanks, Michael > Add task OutputMetrics > -- > > Key: SPARK-3179 > URL: https://issues.apache.org/jira/browse/SPARK-3179 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Sandy Ryza > > Track the bytes that tasks write to HDFS or other output destinations. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2929) Rewrite HiveThriftServer2Suite and CliSuite
[ https://issues.apache.org/jira/browse/SPARK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2929. - Resolution: Fixed Fix Version/s: 1.1.0 > Rewrite HiveThriftServer2Suite and CliSuite > --- > > Key: SPARK-2929 > URL: https://issues.apache.org/jira/browse/SPARK-2929 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.0.1, 1.0.2 >Reporter: Cheng Lian >Assignee: Cheng Lian > Fix For: 1.1.0 > > > {{HiveThriftServer2Suite}} and {{CliSuite}} were inherited from Shark and > contain too may hard coded timeouts and timing assumptions when doing IPC. > This makes these tests both flaky and slow. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3204) MaxOf would be foldable if both left and right are foldable.
[ https://issues.apache.org/jira/browse/SPARK-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3204. - Resolution: Fixed Fix Version/s: 1.1.0 Assignee: Takuya Ueshin > MaxOf would be foldable if both left and right are foldable. > > > Key: SPARK-3204 > URL: https://issues.apache.org/jira/browse/SPARK-3204 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 1.1.0 > > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3188) Add Robust Regression Algorithm with Tukey bisquare weight function (Biweight Estimates)
[ https://issues.apache.org/jira/browse/SPARK-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Jiang updated SPARK-3188: - Description: Linear least square estimates assume the error has normal distribution and can behave badly when the errors are heavy-tailed. In practical we get various types of data. We need to include Robust Regression to employ a fitting criterion that is not as vulnerable as least square. The Tukey bisquare weight function, also referred to as the biweight function, produces an M-estimator that is more resistant to regression outliers than the Huber M-estimator (Andersen 2008: 19). was: Linear least square estimates assume the error has normal distribution and can behave badly when the errors are heavy-tailed. In practical we get various types of data. We need to include Robust Regression to employ a fitting criterion that is not as vulnerable as least square. The Turkey bisquare weight function, also referred to as the biweight function, produces an M-estimator that is more resistant to regression outliers than the Huber M-estimator (Andersen 2008: 19). > Add Robust Regression Algorithm with Tukey bisquare weight function > (Biweight Estimates) > -- > > Key: SPARK-3188 > URL: https://issues.apache.org/jira/browse/SPARK-3188 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 1.0.2 >Reporter: Fan Jiang >Priority: Critical > Labels: features > Fix For: 1.1.1, 1.2.0 > > Original Estimate: 0h > Remaining Estimate: 0h > > Linear least square estimates assume the error has normal distribution and > can behave badly when the errors are heavy-tailed. In practical we get > various types of data. We need to include Robust Regression to employ a > fitting criterion that is not as vulnerable as least square. > The Tukey bisquare weight function, also referred to as the biweight > function, produces an M-estimator that is more resistant to regression > outliers than the Huber M-estimator (Andersen 2008: 19). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3188) Add Robust Regression Algorithm with Tukey bisquare weight function (Biweight Estimates)
[ https://issues.apache.org/jira/browse/SPARK-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Jiang updated SPARK-3188: - Summary: Add Robust Regression Algorithm with Tukey bisquare weight function (Biweight Estimates) (was: Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates) ) > Add Robust Regression Algorithm with Tukey bisquare weight function > (Biweight Estimates) > -- > > Key: SPARK-3188 > URL: https://issues.apache.org/jira/browse/SPARK-3188 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 1.0.2 >Reporter: Fan Jiang >Priority: Critical > Labels: features > Fix For: 1.1.1, 1.2.0 > > Original Estimate: 0h > Remaining Estimate: 0h > > Linear least square estimates assume the error has normal distribution and > can behave badly when the errors are heavy-tailed. In practical we get > various types of data. We need to include Robust Regression to employ a > fitting criterion that is not as vulnerable as least square. > The Turkey bisquare weight function, also referred to as the biweight > function, produces an M-estimator that is more resistant to regression > outliers than the Huber M-estimator (Andersen 2008: 19). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3216) Spark-shell is broken for branch-1.0
[ https://issues.apache.org/jira/browse/SPARK-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3216: - Description: This fails when EC2 tries to clone the most recent version of Spark from branch-1.0. I marked this a blocker because this is completely broken, but it is technically not "blocking" anything. This was caused by https://github.com/apache/spark/pull/1831, which broke spark-shell. The follow-up fix in https://github.com/apache/spark/pull/1825 was only merged into branch-1.1 and master, but not branch-1.0 was:This fails when EC2 tries to clone the most recent version of Spark from branch-1.0. I marked this a blocker because this is completely broken, but it is technically not "blocking" anything. > Spark-shell is broken for branch-1.0 > > > Key: SPARK-3216 > URL: https://issues.apache.org/jira/browse/SPARK-3216 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Andrew Or >Priority: Blocker > > This fails when EC2 tries to clone the most recent version of Spark from > branch-1.0. I marked this a blocker because this is completely broken, but it > is technically not "blocking" anything. > This was caused by https://github.com/apache/spark/pull/1831, which broke > spark-shell. The follow-up fix in https://github.com/apache/spark/pull/1825 > was only merged into branch-1.1 and master, but not branch-1.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3216) Spark-shell is broken for branch-1.0
[ https://issues.apache.org/jira/browse/SPARK-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3216: - Description: This fails when EC2 tries to clone the most recent version of Spark from branch-1.0. This does not actually affect any released distributions, and so I did not set the affected/fix/target versions. I marked this a blocker because this is completely broken, but it is technically not "blocking" anything. This was caused by https://github.com/apache/spark/pull/1831, which broke spark-shell. The follow-up fix in https://github.com/apache/spark/pull/1825 was only merged into branch-1.1 and master, but not branch-1.0. was: This fails when EC2 tries to clone the most recent version of Spark from branch-1.0. I marked this a blocker because this is completely broken, but it is technically not "blocking" anything. This was caused by https://github.com/apache/spark/pull/1831, which broke spark-shell. The follow-up fix in https://github.com/apache/spark/pull/1825 was only merged into branch-1.1 and master, but not branch-1.0 > Spark-shell is broken for branch-1.0 > > > Key: SPARK-3216 > URL: https://issues.apache.org/jira/browse/SPARK-3216 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Andrew Or >Priority: Blocker > > This fails when EC2 tries to clone the most recent version of Spark from > branch-1.0. This does not actually affect any released distributions, and so > I did not set the affected/fix/target versions. I marked this a blocker > because this is completely broken, but it is technically not "blocking" > anything. > This was caused by https://github.com/apache/spark/pull/1831, which broke > spark-shell. The follow-up fix in https://github.com/apache/spark/pull/1825 > was only merged into branch-1.1 and master, but not branch-1.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3189) Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates)
[ https://issues.apache.org/jira/browse/SPARK-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Jiang updated SPARK-3189: - Issue Type: Sub-task (was: New Feature) Parent: SPARK-3188 > Add Robust Regression Algorithm with Turkey bisquare weight function > (Biweight Estimates) > --- > > Key: SPARK-3189 > URL: https://issues.apache.org/jira/browse/SPARK-3189 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 1.0.2 >Reporter: Fan Jiang >Priority: Critical > Labels: features > Fix For: 1.1.1, 1.2.0 > > Original Estimate: 0h > Remaining Estimate: 0h > > Linear least square estimates assume the error has normal distribution and > can behave badly when the errors are heavy-tailed. In practical we get > various types of data. We need to include Robust Regression to employ a > fitting criterion that is not as vulnerable as least square. > The Turkey bisquare weight function, also referred to as the biweight > function, produces and M-estimator that is more resistant to regression > outliers than the Huber M-estimator ()Andersen 2008: 19). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3216) Spark-shell is broken for branch-1.0
[ https://issues.apache.org/jira/browse/SPARK-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109947#comment-14109947 ] Apache Spark commented on SPARK-3216: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/2122 > Spark-shell is broken for branch-1.0 > > > Key: SPARK-3216 > URL: https://issues.apache.org/jira/browse/SPARK-3216 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Andrew Or >Priority: Blocker > > This fails when EC2 tries to clone the most recent version of Spark from > branch-1.0. I marked this a blocker because this is completely broken, but it > is technically not "blocking" anything. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3217) Shaded Guava jar doesn't play well with Maven build
Cheng Lian created SPARK-3217: - Summary: Shaded Guava jar doesn't play well with Maven build Key: SPARK-3217 URL: https://issues.apache.org/jira/browse/SPARK-3217 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Cheng Lian Priority: Blocker PR [#1813|https://github.com/apache/spark/pull/1813] shaded Guava jar file and moved Guava classes to package {{org.spark-project.guava}} when Spark is built by Maven. But code in {{org.apache.spark.util.Utils}} still refers to classes (e.g. {{ThreadFactoryBuilder}}) in package {{com.google.common}}. The result is that, when Spark is built with Maven (or {{make-distribution.sh}}), commands like {{bin/spark-shell}} throws {{ClassNotFoundException}}: {code} # Build Spark with Maven $ mvn clean package -Phive,hadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests ... # Then spark-shell complains $ ./bin/spark-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder at org.apache.spark.util.Utils$.(Utils.scala:636) at org.apache.spark.util.Utils$.(Utils.scala) at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:134) at org.apache.spark.repl.SparkILoop.(SparkILoop.scala:65) at org.apache.spark.repl.Main$.main(Main.scala:30) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:317) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 13 more # Check the assembly jar file $ jar tf assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar | grep -i ThreadFactoryBuilder org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder$1.class org/spark-project/guava/common/util/concurrent/ThreadFactoryBuilder.class {code} SBT build is fine since we don't shade Guava with SBT right now (and that's why Jenkins didn't complain about this). Possible solutions can be: # revert PR #1813 for safe, or # also shade Guava in SBT build and only use {{org.spark-project.guava}} in Spark -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3216) Spark-shell is broken for branch-1.0
Andrew Or created SPARK-3216: Summary: Spark-shell is broken for branch-1.0 Key: SPARK-3216 URL: https://issues.apache.org/jira/browse/SPARK-3216 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Andrew Or Priority: Blocker This fails when EC2 tries to clone the most recent version of Spark from branch-1.0. I marked this a blocker because this is completely broken, but it is technically not "blocking" anything. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3215) Add remote interface for SparkContext
[ https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-3215: -- Attachment: RemoteSparkContext.pdf Initial proposal for a remote context interface. Note that this is not a formal design document, just a high-level proposal, so it doesn't go deeply into what APIs would be exposed on anything like that. > Add remote interface for SparkContext > - > > Key: SPARK-3215 > URL: https://issues.apache.org/jira/browse/SPARK-3215 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Marcelo Vanzin > Labels: hive > Attachments: RemoteSparkContext.pdf > > > A quick description of the issue: as part of running Hive jobs on top of > Spark, it's desirable to have a SparkContext that is running in the > background and listening for job requests for a particular user session. > Running multiple contexts in the same JVM is not a very good solution. Not > only SparkContext currently has issues sharing the same JVM among multiple > instances, but that turns the JVM running the contexts into a huge bottleneck > in the system. > So I'm proposing a solution where we have a SparkContext that is running in a > separate process, and listening for requests from the client application via > some RPC interface (most probably Akka). > I'll attach a document shortly with the current proposal. Let's use this bug > to discuss the proposal and any other suggestions. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3215) Add remote interface for SparkContext
Marcelo Vanzin created SPARK-3215: - Summary: Add remote interface for SparkContext Key: SPARK-3215 URL: https://issues.apache.org/jira/browse/SPARK-3215 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Marcelo Vanzin A quick description of the issue: as part of running Hive jobs on top of Spark, it's desirable to have a SparkContext that is running in the background and listening for job requests for a particular user session. Running multiple contexts in the same JVM is not a very good solution. Not only SparkContext currently has issues sharing the same JVM among multiple instances, but that turns the JVM running the contexts into a huge bottleneck in the system. So I'm proposing a solution where we have a SparkContext that is running in a separate process, and listening for requests from the client application via some RPC interface (most probably Akka). I'll attach a document shortly with the current proposal. Let's use this bug to discuss the proposal and any other suggestions. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3213) spark_ec2.py cannot find slave instances launched with "Launch More Like This"
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-3213: - Summary: spark_ec2.py cannot find slave instances launched with "Launch More Like This" (was: spark_ec2.py cannot find slave instances) > spark_ec2.py cannot find slave instances launched with "Launch More Like This" > -- > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3213) spark_ec2.py cannot find slave instances
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109818#comment-14109818 ] Vida Ha edited comment on SPARK-3213 at 8/25/14 9:57 PM: - Joseph, Josh, & I discussed in person. There is a quick workaround: 1) Use an old version of the spark_ec2 scripts that uses security groups to identify the slaves, if using "Launch more like this" 2) Avoid using "Launch more like this" But now I need to investigate: If using "launch more like this", it does seem like amazon tries to reuse the tags, but I'm wondering if it doesn't like having multiple machines with the same "Name" tag. I will try using a different tag, like "spark-ec2-cluster-id" or something like that to identify the machine. If that tag does copy over, then we can properly support "Launch more like this". was (Author: vidaha): Joseph, Josh, & I discussed in person. There is a quick workarounds: 1) Use an old version of the spark_ec2 scripts that uses security groups to identify the slaves, if using "Launch more like this" But now I need to investigate: If using "launch more like this", it does seem like amazon tries to reuse the tags, but I'm wondering if it doesn't like having multiple machines with the same "Name" tag. I will try using a different tag, like "spark-ec2-cluster-id" or something like that to identify the machine. If that tag does copy over, then we can properly support "Launch more like this". > spark_ec2.py cannot find slave instances > > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3213) spark_ec2.py cannot find slave instances
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109828#comment-14109828 ] Vida Ha commented on SPARK-3213: Can someone rename this issue to: spark_ec2.py cannot find slave instances launched with "Launch More Like This" I think that's more indicative of the issue - it's not wider than that. > spark_ec2.py cannot find slave instances > > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3213) spark_ec2.py cannot find slave instances
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109818#comment-14109818 ] Vida Ha commented on SPARK-3213: Joseph, Josh, & I discussed in person. There is a quick workarounds: 1) Use an old version of the spark_ec2 scripts that uses security groups to identify the slaves, if using "Launch more like this" But now I need to investigate: If using "launch more like this", it does seem like amazon tries to reuse the tags, but I'm wondering if it doesn't like having multiple machines with the same "Name" tag. I will try using a different tag, like "spark-ec2-cluster-id" or something like that to identify the machine. If that tag does copy over, then we can properly support "Launch more like this". > spark_ec2.py cannot find slave instances > > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3214) Argument parsing loop in make-distribution.sh ends prematurely
[ https://issues.apache.org/jira/browse/SPARK-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109765#comment-14109765 ] Cheng Lian commented on SPARK-3214: --- Didn't realize all Maven options must go after other {{make-distribution.sh}} options. Closing this. > Argument parsing loop in make-distribution.sh ends prematurely > -- > > Key: SPARK-3214 > URL: https://issues.apache.org/jira/browse/SPARK-3214 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Cheng Lian >Priority: Minor > > Running {{make-distribution.sh}} in this way: > {code} > ./make-distribution.sh --hadoop -Pyarn > {code} > results in a proper error message: > {code} > Error: '--hadoop' is no longer supported: > Error: use Maven options -Phadoop.version and -Pyarn.version > {code} > But if you running it with options placed in reverse order, it just passes: > {code} > ./make-distribution.sh -Pyarn --hadoop > {code} > The reason is that the {{while}} loop ends prematurely before checking all > potentially deprecated command line options. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3214) Argument parsing loop in make-distribution.sh ends prematurely
[ https://issues.apache.org/jira/browse/SPARK-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian closed SPARK-3214. - Resolution: Not a Problem > Argument parsing loop in make-distribution.sh ends prematurely > -- > > Key: SPARK-3214 > URL: https://issues.apache.org/jira/browse/SPARK-3214 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.0.2 >Reporter: Cheng Lian >Priority: Minor > > Running {{make-distribution.sh}} in this way: > {code} > ./make-distribution.sh --hadoop -Pyarn > {code} > results in a proper error message: > {code} > Error: '--hadoop' is no longer supported: > Error: use Maven options -Phadoop.version and -Pyarn.version > {code} > But if you running it with options placed in reverse order, it just passes: > {code} > ./make-distribution.sh -Pyarn --hadoop > {code} > The reason is that the {{while}} loop ends prematurely before checking all > potentially deprecated command line options. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2798) Correct several small errors in Flume module pom.xml files
[ https://issues.apache.org/jira/browse/SPARK-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109735#comment-14109735 ] Tathagata Das commented on SPARK-2798: -- Naah, that was already closed by the fix I did on friday (https://github.com/apache/spark/pull/2101). Maven and therefore make-distribution should work fine with that fix. > Correct several small errors in Flume module pom.xml files > -- > > Key: SPARK-2798 > URL: https://issues.apache.org/jira/browse/SPARK-2798 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Fix For: 1.1.0 > > > (EDIT) Since the scalatest issue was since resolved, this is now about a few > small problems in the Flume Sink pom.xml > - scalatest is not declared as a test-scope dependency > - Its Avro version doesn't match the rest of the build > - Its Flume version is not synced with the other Flume module > - The other Flume module declares its dependency on Flume Sink slightly > incorrectly, hard-coding the Scala 2.10 version > - It depends on Scala Lang directly, which it shouldn't -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3214) Argument parsing loop in make-distribution.sh ends prematurely
Cheng Lian created SPARK-3214: - Summary: Argument parsing loop in make-distribution.sh ends prematurely Key: SPARK-3214 URL: https://issues.apache.org/jira/browse/SPARK-3214 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.2 Reporter: Cheng Lian Priority: Minor Running {{make-distribution.sh}} in this way: {code} ./make-distribution.sh --hadoop -Pyarn {code} results in a proper error message: {code} Error: '--hadoop' is no longer supported: Error: use Maven options -Phadoop.version and -Pyarn.version {code} But if you running it with options placed in reverse order, it just passes: {code} ./make-distribution.sh -Pyarn --hadoop {code} The reason is that the {{while}} loop ends prematurely before checking all potentially deprecated command line options. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3180) Better control of security groups
[ https://issues.apache.org/jira/browse/SPARK-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-3180. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 2088 [https://github.com/apache/spark/pull/2088] > Better control of security groups > - > > Key: SPARK-3180 > URL: https://issues.apache.org/jira/browse/SPARK-3180 > Project: Spark > Issue Type: Improvement >Reporter: Allan Douglas R. de Oliveira > Fix For: 1.3.0 > > > Two features can be combined together to provide better control of security > group policies: > - The ability to specify the address authorized to access the default > security group (instead of letting everyone: 0.0.0.0/0) > - The possibility to place the created machines on a custom security group > One can use the combinations of the two flags to restrict external access to > the provided security group (e.g by setting the authorized address to > 127.0.0.1/32) while maintaining compatibility with the current behavior. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3156) DecisionTree: Order categorical features adaptively
[ https://issues.apache.org/jira/browse/SPARK-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3156: - Assignee: Joseph K. Bradley > DecisionTree: Order categorical features adaptively > --- > > Key: SPARK-3156 > URL: https://issues.apache.org/jira/browse/SPARK-3156 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > Improvement: accuracy > Currently, ordered categorical features use a fixed bin ordering chosen > before training based on a subsample of the data. (See the code using > centroids in findSplitsBins().) > Proposal: Choose the ordering adaptively for every split. This would require > a bit more computation on the master, but could improve results by splitting > more intelligently. > Required changes: The result of aggregation is used in > findAggForOrderedFeatureClassification() to compute running totals over the > pre-set ordering of categorical feature values. The stats should instead be > used to choose a new ordering of categories, before computing running totals. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3213) spark_ec2.py cannot find slave instances
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109700#comment-14109700 ] Joseph K. Bradley commented on SPARK-3213: -- The security group name I was using was "joseph-r3.2xlarge-slaves" It may be a regex/matching issue. > spark_ec2.py cannot find slave instances > > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3213) spark_ec2.py cannot find slave instances
[ https://issues.apache.org/jira/browse/SPARK-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109697#comment-14109697 ] Joseph K. Bradley commented on SPARK-3213: -- [~vidaha] Please take a look. Thanks! > spark_ec2.py cannot find slave instances > > > Key: SPARK-3213 > URL: https://issues.apache.org/jira/browse/SPARK-3213 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Joseph K. Bradley >Priority: Blocker > > spark_ec2.py cannot find all slave instances. In particular: > * I created a master & slave and configured them. > * I created new slave instances from the original slave ("Launch More Like > This"). > * I tried to relaunch the cluster, and it could only find the original slave. > Old versions of the script worked. The latest working commit which edited > that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 > There may be a problem with this PR: > [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3213) spark_ec2.py cannot find slave instances
Joseph K. Bradley created SPARK-3213: Summary: spark_ec2.py cannot find slave instances Key: SPARK-3213 URL: https://issues.apache.org/jira/browse/SPARK-3213 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.1.0 Reporter: Joseph K. Bradley Priority: Blocker spark_ec2.py cannot find all slave instances. In particular: * I created a master & slave and configured them. * I created new slave instances from the original slave ("Launch More Like This"). * I tried to relaunch the cluster, and it could only find the original slave. Old versions of the script worked. The latest working commit which edited that .py script is: a0bcbc159e89be868ccc96175dbf1439461557e1 There may be a problem with this PR: [https://github.com/apache/spark/pull/1899]. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3044) Create RSS feed for Spark News
[ https://issues.apache.org/jira/browse/SPARK-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109677#comment-14109677 ] Nicholas Chammas commented on SPARK-3044: - Hi Michael, I don't know if the site itself is open-source. We might need someone from Databricks to update it. [~pwendell], [~rxin] - Is it possible for contributors to contribute to the [main Spark site|http://spark.apache.org/]? > Create RSS feed for Spark News > -- > > Key: SPARK-3044 > URL: https://issues.apache.org/jira/browse/SPARK-3044 > Project: Spark > Issue Type: Documentation >Reporter: Nicholas Chammas >Priority: Minor > > Project updates are often posted here: http://spark.apache.org/news/ > Currently, there is no way to subscribe to a feed of these updates. It would > be nice there was a way people could be notified of new posts there without > having to check manually. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2798) Correct several small errors in Flume module pom.xml files
[ https://issues.apache.org/jira/browse/SPARK-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109675#comment-14109675 ] Sean Owen commented on SPARK-2798: -- [~tdas] Cool, I think this closes SPARK-3169 too if I understand correctly > Correct several small errors in Flume module pom.xml files > -- > > Key: SPARK-2798 > URL: https://issues.apache.org/jira/browse/SPARK-2798 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Fix For: 1.1.0 > > > (EDIT) Since the scalatest issue was since resolved, this is now about a few > small problems in the Flume Sink pom.xml > - scalatest is not declared as a test-scope dependency > - Its Avro version doesn't match the rest of the build > - Its Flume version is not synced with the other Flume module > - The other Flume module declares its dependency on Flume Sink slightly > incorrectly, hard-coding the Scala 2.10 version > - It depends on Scala Lang directly, which it shouldn't -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org