[jira] [Resolved] (SPARK-5108) Need to make jackson dependency version consistent with hadoop-2.6.0.

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5108.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 3938
[https://github.com/apache/spark/pull/3938]

 Need to make jackson dependency version consistent with hadoop-2.6.0.
 -

 Key: SPARK-5108
 URL: https://issues.apache.org/jira/browse/SPARK-5108
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Zhan Zhang
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5108) Need to make jackson dependency version consistent with hadoop-2.6.0.

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5108:
-
Assignee: Zhan Zhang

 Need to make jackson dependency version consistent with hadoop-2.6.0.
 -

 Key: SPARK-5108
 URL: https://issues.apache.org/jira/browse/SPARK-5108
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Zhan Zhang
Assignee: Zhan Zhang
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS

2015-02-07 Thread Sean Owen (JIRA)
Sean Owen created SPARK-5669:


 Summary: Spark assembly includes incompatibly licensed 
libgfortran, libgcc code via JBLAS
 Key: SPARK-5669
 URL: https://issues.apache.org/jira/browse/SPARK-5669
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Sean Owen
Priority: Blocker
 Fix For: 1.3.0


Sorry for Blocker, but it's a license issue. The Spark assembly includes the 
following, from JBLAS:

{code}
lib/
lib/static/
lib/static/Mac OS X/
lib/static/Mac OS X/x86_64/
lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib
lib/static/Mac OS X/x86_64/sse3/
lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib
lib/static/Windows/
lib/static/Windows/x86/
lib/static/Windows/x86/libgfortran-3.dll
lib/static/Windows/x86/libgcc_s_dw2-1.dll
lib/static/Windows/x86/jblas_arch_flavor.dll
lib/static/Windows/x86/sse3/
lib/static/Windows/x86/sse3/jblas.dll
lib/static/Windows/amd64/
lib/static/Windows/amd64/libgfortran-3.dll
lib/static/Windows/amd64/jblas.dll
lib/static/Windows/amd64/libgcc_s_sjlj-1.dll
lib/static/Windows/amd64/jblas_arch_flavor.dll
lib/static/Linux/
lib/static/Linux/i386/
lib/static/Linux/i386/sse3/
lib/static/Linux/i386/sse3/libjblas.so
lib/static/Linux/i386/libjblas_arch_flavor.so
lib/static/Linux/amd64/
lib/static/Linux/amd64/sse3/
lib/static/Linux/amd64/sse3/libjblas.so
lib/static/Linux/amd64/libjblas_arch_flavor.so
{code}

Unfortunately the libgfortran and libgcc libraries included for Windows are not 
licensed in a way that's compatible with Spark and the AL2 -- LGPL at least.

It's easy to exclude them. I'm not clear what it does to running on Windows; I 
assume it can still work but the libs would have to be made available locally 
and put on the shared library path manually. I don't think there's a package 
manager as in Linux that would make it easily available. I'm not able to test 
on Windows.

If it doesn't work, the follow-up question is whether that means JBLAS has to 
be removed on the double, or treated as a known issue for 1.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2451) Enable to load config file for Akka

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2451:
-
Component/s: Spark Core
   Priority: Minor  (was: Major)
 Issue Type: Improvement  (was: Bug)

 Enable to load config file for Akka
 ---

 Key: SPARK-2451
 URL: https://issues.apache.org/jira/browse/SPARK-2451
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Kousuke Saruta
Priority: Minor

 In current implementation, we cannot let Akka to load config file.
 Sometimes we want to use custom config file for Akka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4617) Fix spark.yarn.applicationMaster.waitTries doc

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4617:
-
Priority: Minor  (was: Major)

Is the change here to remove the doc for this property? the current code says 
that this config is deprecated.

 Fix spark.yarn.applicationMaster.waitTries doc
 --

 Key: SPARK-4617
 URL: https://issues.apache.org/jira/browse/SPARK-4617
 Project: Spark
  Issue Type: Bug
  Components: Documentation, YARN
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1061) allow Hadoop RDDs to be read w/ a partitioner

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-1061:

Component/s: Spark Core

 allow Hadoop RDDs to be read w/ a partitioner
 -

 Key: SPARK-1061
 URL: https://issues.apache.org/jira/browse/SPARK-1061
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Imran Rashid
Assignee: Imran Rashid

 Using partitioners to get narrow dependencies can save tons of time on a 
 shuffle.  However, after saving an RDD to hdfs, and then reloading it, all 
 partitioner information is lost.  This means that you can never get a narrow 
 dependency when loading data from hadoop.
 I think we could get around this by:
 1) having a modified version of hadoop rdd that kept track of original part 
 file (or maybe just prevent splits altogether ...)
 2) add a assumePartition(partitioner:Partitioner, verify: Boolean) function 
 to RDD.  It would create a new RDD, which had the exact same data but just 
 pretended that the RDD had the given partitioner applied to it.  And if 
 verify=true, it could add a mapPartitionsWithIndex to check that each record 
 was in the right partition.
 http://apache-spark-user-list.1001560.n3.nabble.com/setting-partitioners-with-hadoop-rdds-td976.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5664) Restore stty settings when exiting for launching spark-shell from SBT

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5664:

Component/s: Build

 Restore stty settings when exiting for launching spark-shell from SBT
 -

 Key: SPARK-5664
 URL: https://issues.apache.org/jira/browse/SPARK-5664
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Liang-Chi Hsieh





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4820) Spark build encounters File name too long on some encrypted filesystems

2015-02-07 Thread Jian Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311061#comment-14311061
 ] 

Jian Zhou commented on SPARK-4820:
--

Encountered this issue in encfs, and this workaround works.

 Spark build encounters File name too long on some encrypted filesystems
 -

 Key: SPARK-4820
 URL: https://issues.apache.org/jira/browse/SPARK-4820
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell

 This was reported by Luchesar Cekov on github along with a proposed fix. The 
 fix has some potential downstream issues (it will modify the classnames) so 
 until we understand better how many users are affected we aren't going to 
 merge it. However, I'd like to include the issue and workaround here. If you 
 encounter this issue please comment on the JIRA so we can assess the 
 frequency.
 The issue produces this error:
 {code}
 [error] == Expanded type of tree ==
 [error] 
 [error] ConstantType(value = Constant(Throwable))
 [error] 
 [error] uncaught exception during compilation: java.io.IOException
 [error] File name too long
 [error] two errors found
 {code}
 The workaround is in maven under the compile options add: 
 {code}
 +  arg-Xmax-classfile-name/arg
 +  arg128/arg
 {code}
 In SBT add:
 {code}
 +scalacOptions in Compile ++= Seq(-Xmax-classfile-name, 128),
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j

2015-02-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5524:
---
Component/s: Spark Core

 Remove messy dependencies to log4j
 --

 Key: SPARK-5524
 URL: https://issues.apache.org/jira/browse/SPARK-5524
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Jacek Lewandowski

 There are some tickets regarding loosening the dependency on Log4j, however 
 some classes still use the following scheme:
 {code}
   if (Logger.getLogger(classOf[SomeClass]).getLevel == null) {
 Logger.getLogger(classOf[SomeClass]).setLevel(someLevel)
   }
 {code}
 This doesn't look good and make it difficult to track why some logs are 
 missing when you use Log4j and why they are flooding when you use something 
 else, like logback. 
 There is a Logging class which checks whether we use Log4j or not. Why not 
 delegate all of such invocations, where the Logging class could handle it 
 properly, maybe considering more logging implementations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5524) Remove messy dependencies to log4j

2015-02-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1430#comment-1430
 ] 

Patrick Wendell commented on SPARK-5524:


[~nchammas] I don't think this is related to the build, so I've changed the 
component.

 Remove messy dependencies to log4j
 --

 Key: SPARK-5524
 URL: https://issues.apache.org/jira/browse/SPARK-5524
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Jacek Lewandowski

 There are some tickets regarding loosening the dependency on Log4j, however 
 some classes still use the following scheme:
 {code}
   if (Logger.getLogger(classOf[SomeClass]).getLevel == null) {
 Logger.getLogger(classOf[SomeClass]).setLevel(someLevel)
   }
 {code}
 This doesn't look good and make it difficult to track why some logs are 
 missing when you use Log4j and why they are flooding when you use something 
 else, like logback. 
 There is a Logging class which checks whether we use Log4j or not. Why not 
 delegate all of such invocations, where the Logging class could handle it 
 properly, maybe considering more logging implementations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j

2015-02-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5524:
---
Component/s: (was: Build)

 Remove messy dependencies to log4j
 --

 Key: SPARK-5524
 URL: https://issues.apache.org/jira/browse/SPARK-5524
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Jacek Lewandowski

 There are some tickets regarding loosening the dependency on Log4j, however 
 some classes still use the following scheme:
 {code}
   if (Logger.getLogger(classOf[SomeClass]).getLevel == null) {
 Logger.getLogger(classOf[SomeClass]).setLevel(someLevel)
   }
 {code}
 This doesn't look good and make it difficult to track why some logs are 
 missing when you use Log4j and why they are flooding when you use something 
 else, like logback. 
 There is a Logging class which checks whether we use Log4j or not. Why not 
 delegate all of such invocations, where the Logging class could handle it 
 properly, maybe considering more logging implementations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5673) Implement Streaming wrapper for all linear methos

2015-02-07 Thread Kirill A. Korinskiy (JIRA)
Kirill A. Korinskiy created SPARK-5673:
--

 Summary: Implement Streaming wrapper for all linear methos
 Key: SPARK-5673
 URL: https://issues.apache.org/jira/browse/SPARK-5673
 Project: Spark
  Issue Type: New Feature
Reporter: Kirill A. Korinskiy


Now spark had only streaming wrapper for Logistic and Linear regressions only.

So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming 
fashion more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4647) yarn-client mode reports success even though job fails

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4647.
--
Resolution: Duplicate

Also duplicates SPARK-3293

 yarn-client mode reports success even though job fails
 --

 Key: SPARK-4647
 URL: https://issues.apache.org/jira/browse/SPARK-4647
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: SaintBacchus

 yarn's web show SUCCEEDED when the driver throw a exception in yarn-client



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5670) Spark artifacts compiled with Hadoop 1.x

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5670.
--
Resolution: Not a Problem

This is another question that should be asked at user@, please.

The artifacts published to Maven can only be compiled against one version of 
anything. Well, you can make a bunch of different artifacts with different 
{{classifier}}s, but, here the idea is that it doesn't matter: you are always 
compiling against these artifacts as an API, and never relying on them for 
their transitive Hadoop dependency. You mark these dependencies as provided 
in your app, and when executed on a cluster, they are using the correct 
dependencies for that cluster or something.

Your error suggests that you have actually bundled old Hadoop code into your 
application. Don't do that; use provided scope.

 Spark artifacts compiled with Hadoop 1.x
 

 Key: SPARK-5670
 URL: https://issues.apache.org/jira/browse/SPARK-5670
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: Spark 1.2
Reporter: DeepakVohra

 Why are Spark artifacts available from Maven compiled with Hadoop 1.x while 
 the Spark binaries for Hadoop 1.x are not available? Also CDH is not 
 available for Hadoop 1.x.
 Using Hadoop 2.0.0 or Hadoop 2.3 with Spark artifacts generates error such as 
 the following.
 Server IPC version 7 cannot communicate with client version 4
 Server IPC version 9 cannot communicate with client version 4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3760) Add Twitter4j FilterQuery to spark streaming twitter API

2015-02-07 Thread Eugene Zhulenev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Zhulenev closed SPARK-3760.
--
Resolution: Won't Fix

 Add Twitter4j FilterQuery to spark streaming twitter API
 

 Key: SPARK-3760
 URL: https://issues.apache.org/jira/browse/SPARK-3760
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.1.0
Reporter: Eugene Zhulenev
Priority: Minor

 TwitterUtils.createStream(...) allows users to specify keywords that restrict 
 the tweets that are returned. However FilterQuery from Twitter4j has a bunch 
 of other options including location that was asked in SPARK-2788. Best 
 solution will be add alternative createStream method with FilterQuery as 
 argument instead of keywords.
 Pull Request: https://github.com/apache/spark/pull/2618



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5080) Expose more cluster resource information to user

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5080:

Component/s: Spark Core

 Expose more cluster resource information to user
 

 Key: SPARK-5080
 URL: https://issues.apache.org/jira/browse/SPARK-5080
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Rui Li

 It'll be useful if user can get detailed cluster resource info, e.g. 
 granted/allocated executors, memory and CPU.
 Such information is available via WebUI but seems SparkContext doesn't have 
 these APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5524:

Component/s: Build

 Remove messy dependencies to log4j
 --

 Key: SPARK-5524
 URL: https://issues.apache.org/jira/browse/SPARK-5524
 Project: Spark
  Issue Type: Task
  Components: Build
Reporter: Jacek Lewandowski

 There are some tickets regarding loosening the dependency on Log4j, however 
 some classes still use the following scheme:
 {code}
   if (Logger.getLogger(classOf[SomeClass]).getLevel == null) {
 Logger.getLogger(classOf[SomeClass]).setLevel(someLevel)
   }
 {code}
 This doesn't look good and make it difficult to track why some logs are 
 missing when you use Log4j and why they are flooding when you use something 
 else, like logback. 
 There is a Logging class which checks whether we use Log4j or not. Why not 
 delegate all of such invocations, where the Logging class could handle it 
 properly, maybe considering more logging implementations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles

2015-02-07 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-5671.
---
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4454
[https://github.com/apache/spark/pull/4454]

 Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles
 -

 Key: SPARK-5671
 URL: https://issues.apache.org/jira/browse/SPARK-5671
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Josh Rosen
Assignee: Josh Rosen
 Fix For: 1.3.0


 Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and 
 hadoop-2.4 profiles fixes a dependency conflict issue that was causing 
 UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN 
 builds.
 Jets3t release notes can be found here: 
 http://www.jets3t.org/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5156) Priority queue for cross application scheduling

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5156:

Component/s: Scheduler

 Priority queue for cross application scheduling
 ---

 Key: SPARK-5156
 URL: https://issues.apache.org/jira/browse/SPARK-5156
 Project: Spark
  Issue Type: Wish
  Components: Scheduler
Reporter: Timothy Wilder
Priority: Minor

 FIFO is useful, but for many use cases, something more fine-grained would be 
 excellent. If possible, I would love to see an optional priority queue for 
 cross application scheduling.
 The gist of this would be that applications could be submitted with a 
 priority, and the highest priority application would be executed first.
 A means to do crud operations on the queue would also be fantastic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5524) Remove messy dependencies to log4j

2015-02-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311122#comment-14311122
 ] 

Nicholas Chammas commented on SPARK-5524:
-

Oh my bad. Thanks for the correction.

 Remove messy dependencies to log4j
 --

 Key: SPARK-5524
 URL: https://issues.apache.org/jira/browse/SPARK-5524
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Jacek Lewandowski

 There are some tickets regarding loosening the dependency on Log4j, however 
 some classes still use the following scheme:
 {code}
   if (Logger.getLogger(classOf[SomeClass]).getLevel == null) {
 Logger.getLogger(classOf[SomeClass]).setLevel(someLevel)
   }
 {code}
 This doesn't look good and make it difficult to track why some logs are 
 missing when you use Log4j and why they are flooding when you use something 
 else, like logback. 
 There is a Logging class which checks whether we use Log4j or not. Why not 
 delegate all of such invocations, where the Logging class could handle it 
 properly, maybe considering more logging implementations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310931#comment-14310931
 ] 

Apache Spark commented on SPARK-5669:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4453

 Spark assembly includes incompatibly licensed libgfortran, libgcc code via 
 JBLAS
 

 Key: SPARK-5669
 URL: https://issues.apache.org/jira/browse/SPARK-5669
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Sean Owen
Priority: Blocker
 Fix For: 1.3.0


 Sorry for Blocker, but it's a license issue. The Spark assembly includes 
 the following, from JBLAS:
 {code}
 lib/
 lib/static/
 lib/static/Mac OS X/
 lib/static/Mac OS X/x86_64/
 lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib
 lib/static/Mac OS X/x86_64/sse3/
 lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib
 lib/static/Windows/
 lib/static/Windows/x86/
 lib/static/Windows/x86/libgfortran-3.dll
 lib/static/Windows/x86/libgcc_s_dw2-1.dll
 lib/static/Windows/x86/jblas_arch_flavor.dll
 lib/static/Windows/x86/sse3/
 lib/static/Windows/x86/sse3/jblas.dll
 lib/static/Windows/amd64/
 lib/static/Windows/amd64/libgfortran-3.dll
 lib/static/Windows/amd64/jblas.dll
 lib/static/Windows/amd64/libgcc_s_sjlj-1.dll
 lib/static/Windows/amd64/jblas_arch_flavor.dll
 lib/static/Linux/
 lib/static/Linux/i386/
 lib/static/Linux/i386/sse3/
 lib/static/Linux/i386/sse3/libjblas.so
 lib/static/Linux/i386/libjblas_arch_flavor.so
 lib/static/Linux/amd64/
 lib/static/Linux/amd64/sse3/
 lib/static/Linux/amd64/sse3/libjblas.so
 lib/static/Linux/amd64/libjblas_arch_flavor.so
 {code}
 Unfortunately the libgfortran and libgcc libraries included for Windows are 
 not licensed in a way that's compatible with Spark and the AL2 -- LGPL at 
 least.
 It's easy to exclude them. I'm not clear what it does to running on Windows; 
 I assume it can still work but the libs would have to be made available 
 locally and put on the shared library path manually. I don't think there's a 
 package manager as in Linux that would make it easily available. I'm not able 
 to test on Windows.
 If it doesn't work, the follow-up question is whether that means JBLAS has to 
 be removed on the double, or treated as a known issue for 1.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5626) Spurious test failures due to NullPointerException in EasyMock test code

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5626:
-
Component/s: Tests

 Spurious test failures due to NullPointerException in EasyMock test code
 

 Key: SPARK-5626
 URL: https://issues.apache.org/jira/browse/SPARK-5626
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.3.0
Reporter: Josh Rosen
  Labels: flaky-test
 Attachments: consoleText.txt


 I've seen a few cases where a test failure will trigger a cascade of spurious 
 failures when instantiating test suites that use EasyMock.  Here's a sample 
 symptom:
 {code}
 [info] CacheManagerSuite:
 [info] Exception encountered when attempting to run a suite with class name: 
 org.apache.spark.CacheManagerSuite *** ABORTED *** (137 milliseconds)
 [info]   java.lang.NullPointerException:
 [info]   at 
 org.objenesis.strategy.StdInstantiatorStrategy.newInstantiatorOf(StdInstantiatorStrategy.java:52)
 [info]   at 
 org.objenesis.ObjenesisBase.getInstantiatorOf(ObjenesisBase.java:90)
 [info]   at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73)
 [info]   at org.objenesis.ObjenesisHelper.newInstance(ObjenesisHelper.java:43)
 [info]   at 
 org.easymock.internal.ObjenesisClassInstantiator.newInstance(ObjenesisClassInstantiator.java:26)
 [info]   at 
 org.easymock.internal.ClassProxyFactory.createProxy(ClassProxyFactory.java:219)
 [info]   at 
 org.easymock.internal.MocksControl.createMock(MocksControl.java:59)
 [info]   at org.easymock.EasyMock.createMock(EasyMock.java:103)
 [info]   at 
 org.scalatest.mock.EasyMockSugar$class.mock(EasyMockSugar.scala:267)
 [info]   at 
 org.apache.spark.CacheManagerSuite.mock(CacheManagerSuite.scala:28)
 [info]   at 
 org.apache.spark.CacheManagerSuite$$anonfun$1.apply$mcV$sp(CacheManagerSuite.scala:40)
 [info]   at 
 org.apache.spark.CacheManagerSuite$$anonfun$1.apply(CacheManagerSuite.scala:38)
 [info]   at 
 org.apache.spark.CacheManagerSuite$$anonfun$1.apply(CacheManagerSuite.scala:38)
 [info]   at 
 org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:195)
 [info]   at 
 org.apache.spark.CacheManagerSuite.runTest(CacheManagerSuite.scala:28)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
 [info]   at scala.collection.immutable.List.foreach(List.scala:318)
 [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
 [info]   at 
 org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
 [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
 [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
 [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
 [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
 [info]   at 
 org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
 [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
 [info]   at 
 org.apache.spark.CacheManagerSuite.org$scalatest$BeforeAndAfter$$super$run(CacheManagerSuite.scala:28)
 [info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
 [info]   at org.apache.spark.CacheManagerSuite.run(CacheManagerSuite.scala:28)
 [info]   at 
 org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
 [info]   at 
 org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
 [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
 [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
 [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 [info]   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 [info]   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 [info]   at java.lang.Thread.run(Thread.java:745)
 {code}
 This is from 
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26852/consoleFull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4442) Move common unit test utilities into their own package / module

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4442:
-
Component/s: Tests

 Move common unit test utilities into their own package / module
 ---

 Key: SPARK-4442
 URL: https://issues.apache.org/jira/browse/SPARK-4442
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Reporter: Josh Rosen
Priority: Minor

 We should move generally-useful unit test fixtures / utility methods to their 
 own test utilities set package / module to make them easier to find / use.
 See https://github.com/apache/spark/pull/3121#discussion-diff-20413659 for 
 one example of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4424) Clean up all SparkContexts in unit tests so that spark.driver.allowMultipleContexts can be false

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4424:
-
Component/s: Tests

 Clean up all SparkContexts in unit tests so that 
 spark.driver.allowMultipleContexts can be false
 

 Key: SPARK-4424
 URL: https://issues.apache.org/jira/browse/SPARK-4424
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Reporter: Josh Rosen
Priority: Minor

 This is a followup JIRA to SPARK-4180 to make sure that we finish refactoring 
 the unit tests so that all SparkContexts are properly cleaned up; since the 
 current tests don't perform proper cleanup, we currently need to set 
 {{spark.driver.allowMultipleContexts=true}} in the test configuration.
 It may be best to do this as part of a larger refactoring / cleanup of our 
 test code to use cleaner test fixture patterns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4746) integration tests should be separated from faster unit tests

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4746:
-
Component/s: Tests

 integration tests should be separated from faster unit tests
 

 Key: SPARK-4746
 URL: https://issues.apache.org/jira/browse/SPARK-4746
 Project: Spark
  Issue Type: Bug
  Components: Tests
Reporter: Imran Rashid
Priority: Trivial

 Currently there isn't a good way for a developer to skip the longer 
 integration tests.  This can slow down local development.  See 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Spurious-test-failures-testing-best-practices-td9560.html
 One option is to use scalatest's notion of test tags to tag all integration 
 tests, so they could easily be skipped



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core

2015-02-07 Thread DeepakVohra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311012#comment-14311012
 ] 

DeepakVohra commented on SPARK-5625:


Spark artifacts are not in the Spark binaries/assembly. 

 Spark binaries do not incude Spark Core
 ---

 Key: SPARK-5625
 URL: https://issues.apache.org/jira/browse/SPARK-5625
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: CDH4
Reporter: DeepakVohra

 Spark binaries for CDH 4 do not include the Spark Core Jar. 
 http://spark.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5668) spark_ec2.py region parameter could be either mandatory or its value displayed

2015-02-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311054#comment-14311054
 ] 

Nicholas Chammas commented on SPARK-5668:
-

This sounds good to me, Miguel. I've been bitten by this before.

I favor option 1. Workable defaults are generally convenient to have, so I 
wouldn't want to make {{--region}} mandatory. Also, that would break the tool 
for those who have built scripts that invoke {{spark-ec2}} without specifying 
the region.

 spark_ec2.py region parameter could be either mandatory or its value displayed
 --

 Key: SPARK-5668
 URL: https://issues.apache.org/jira/browse/SPARK-5668
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Affects Versions: 1.2.0, 1.3.0, 1.4.0
Reporter: Miguel Peralvo
Priority: Minor

 If the region parameter is not specified when invoking spark-ec2 
 (spark-ec2.py behind the scenes) it defaults to us-east-1. When the cluster 
 doesn't belong to that region, after showing the Searching for existing 
 cluster Spark... message, it causes an ERROR: Could not find any existing 
 cluster exception because it doesn't find you cluster in the default region.
 As it doesn't tell you anything about the region, It can be a small headache 
 for new users.
 In 
 http://stackoverflow.com/questions/21171576/why-does-spark-ec2-fail-with-error-could-not-find-any-existing-cluster,
  Dmitriy Selivanov explains it.
 I propose that:
 1. Either we make the search message a little bit more informative with 
 something like Searching for existing cluster Spark in region  + 
 opts.region.
 2. Or we remove the us-east-1 as default and make the --region parameter 
 mandatory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5672) Don't return `ERROR 500` when have missing args

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311059#comment-14311059
 ] 

Apache Spark commented on SPARK-5672:
-

User 'catap' has created a pull request for this issue:
https://github.com/apache/spark/pull/4239

 Don't return `ERROR 500` when have missing args
 ---

 Key: SPARK-5672
 URL: https://issues.apache.org/jira/browse/SPARK-5672
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Kirill A. Korinskiy

 Spark web UI return HTTP ERROR 500 when GET arguments is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core

2015-02-07 Thread DeepakVohra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311076#comment-14311076
 ] 

DeepakVohra commented on SPARK-5625:



The spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar is not a valid archive.
http://s763.photobucket.com/user/dvohra10/media/SparkAssembly_zps4319294c.jpg.html?o=0

The spark-1.2.0-bin-cdh4.tgz is downloaded from 
http://www.apache.org/dyn/closer.cgi/spark/spark-1.2.0/spark-1.2.0-bin-cdh4.tgz

 Spark binaries do not incude Spark Core
 ---

 Key: SPARK-5625
 URL: https://issues.apache.org/jira/browse/SPARK-5625
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: CDH4
Reporter: DeepakVohra

 Spark binaries for CDH 4 do not include the Spark Core Jar. 
 http://spark.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5053) Test maintenance branches on Jenkins using SBT

2015-02-07 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-5053.
---
Resolution: Fixed
  Assignee: Josh Rosen

I'm going to resolve this as fixed, since the scope of this JIRA was to create 
the maintenance SBT builds and to get them into a serviceable state.  The 
current problems with some of those builds are outside the original scope of 
this JIRA and will be addressed separately.

 Test maintenance branches on Jenkins using SBT
 --

 Key: SPARK-5053
 URL: https://issues.apache.org/jira/browse/SPARK-5053
 Project: Spark
  Issue Type: New Feature
  Components: Project Infra
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Blocker

 We need to create Jenkins jobs to test maintenance branches using SBT.  The 
 current Maven jobs for backport branches do not run the same checks that the 
 pull request builder / SBT builds do (e.g. MiMa checks, PySpark, RAT, etc.) 
 which means that cherry-picking backports can silently break things and we'll 
 only discover it once PRs that are explicitly opened against those branches 
 fail tests; this long delay between introducing test failures and detecting 
 them is a huge productivity issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5672) Don't return `ERROR 500` when have missing args

2015-02-07 Thread Kirill A. Korinskiy (JIRA)
Kirill A. Korinskiy created SPARK-5672:
--

 Summary: Don't return `ERROR 500` when have missing args
 Key: SPARK-5672
 URL: https://issues.apache.org/jira/browse/SPARK-5672
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Kirill A. Korinskiy


Spark web UI return HTTP ERROR 500 when GET arguments is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-578) Fix interpreter code generation to only capture needed dependencies

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-578:

Priority: Major

 Fix interpreter code generation to only capture needed dependencies
 ---

 Key: SPARK-578
 URL: https://issues.apache.org/jira/browse/SPARK-578
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-540) Add API to customize in-memory representation of RDDs

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-540:

Priority: Minor

 Add API to customize in-memory representation of RDDs
 -

 Key: SPARK-540
 URL: https://issues.apache.org/jira/browse/SPARK-540
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Matei Zaharia
Priority: Minor

 Right now the choice between serialized caching and just Java objects in dev 
 is fine, but it might be cool to also support structures such as 
 column-oriented storage through arrays of primitives without forcing it 
 through the serialization interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-573) Clarify semantics of the parallelized closures

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-573:

Priority: Minor

 Clarify semantics of the parallelized closures
 --

 Key: SPARK-573
 URL: https://issues.apache.org/jira/browse/SPARK-573
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: tjhunter
Priority: Minor

 I do not think there is any guideline about which features of scala are 
 allowed/forbidden in the closure that gets sent to the remote nodes. Two 
 examples I have are a return statement and updating mutable variables of 
 singletons.
 Ideally, a compiler plugin could give an error at compile time, but a good 
 error message at run time would be good also.
 Are there any other cases that should not be allowed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5191) Pyspark: scheduler hangs when importing a standalone pyspark app

2015-02-07 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-5191.
---
Resolution: Not a Problem

I'm going to resolve this as Not a Problem since the problem here lies with the 
user code and not Spark itself (we might be able to fix this, but we can't 
guarantee that invalid user programs will work correctly).

 Pyspark: scheduler hangs when importing a standalone pyspark app
 

 Key: SPARK-5191
 URL: https://issues.apache.org/jira/browse/SPARK-5191
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Scheduler
Affects Versions: 1.0.2, 1.1.1, 1.3.0, 1.2.1
Reporter: Daniel Liu

 In a.py:
 {code}
 from pyspark import SparkContext
 sc = SparkContext(local, test spark)
 rdd = sc.parallelize(range(1, 10))
 print rdd.count()
 {code}
 In b.py:
 {code}
 from a import *
 {code}
 {{python a.py}} runs fine
 {{python b.py}} will hang at TaskSchedulerImpl: Removed TaskSet 0.0, whose 
 tasks have all completed, from pool
 {{./bin/spark-submit --py-files a.py b.py}} has the same problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5670) Spark artifacts compiled with Hadoop 1.x

2015-02-07 Thread DeepakVohra (JIRA)
DeepakVohra created SPARK-5670:
--

 Summary: Spark artifacts compiled with Hadoop 1.x
 Key: SPARK-5670
 URL: https://issues.apache.org/jira/browse/SPARK-5670
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: Spark 1.2
Reporter: DeepakVohra


Why are Spark artifacts available from Maven compiled with Hadoop 1.x while the 
Spark binaries for Hadoop 1.x are not available? Also CDH is not available for 
Hadoop 1.x.

Using Hadoop 2.0.0 or Hadoop 2.3 with Spark artifacts generates error such as 
the following.

Server IPC version 7 cannot communicate with client version 4
Server IPC version 9 cannot communicate with client version 4






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5670) Spark artifacts compiled with Hadoop 1.x

2015-02-07 Thread DeepakVohra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311031#comment-14311031
 ] 

DeepakVohra commented on SPARK-5670:


Not using Maven to run the Spark application to be able to set provided 
scope.  Running Spark application as local master URL.

 Spark artifacts compiled with Hadoop 1.x
 

 Key: SPARK-5670
 URL: https://issues.apache.org/jira/browse/SPARK-5670
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: Spark 1.2
Reporter: DeepakVohra

 Why are Spark artifacts available from Maven compiled with Hadoop 1.x while 
 the Spark binaries for Hadoop 1.x are not available? Also CDH is not 
 available for Hadoop 1.x.
 Using Hadoop 2.0.0 or Hadoop 2.3 with Spark artifacts generates error such as 
 the following.
 Server IPC version 7 cannot communicate with client version 4
 Server IPC version 9 cannot communicate with client version 4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1142) Allow adding jars on app submission, outside of code

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-1142:

Component/s: Spark Submit

 Allow adding jars on app submission, outside of code
 

 Key: SPARK-1142
 URL: https://issues.apache.org/jira/browse/SPARK-1142
 Project: Spark
  Issue Type: Improvement
  Components: Spark Submit
Affects Versions: 0.9.0
Reporter: Sandy Pérez González
Assignee: Sandy Pérez González

 yarn-standalone mode supports an option that allows adding jars that will be 
 distributed on the cluster with job submission.  Providing similar 
 functionality for other app submission modes will allow the spark-app script 
 proposed in SPARK-1126 to support an add-jars option that works for every 
 submit mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4383) Delay scheduling doesn't work right when jobs have tasks with different locality levels

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-4383:

Component/s: Scheduler

 Delay scheduling doesn't work right when jobs have tasks with different 
 locality levels
 ---

 Key: SPARK-4383
 URL: https://issues.apache.org/jira/browse/SPARK-4383
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.0.2, 1.1.0
Reporter: Kay Ousterhout

 Copied from mailing list discussion:
 Now our application will load data from hdfs in the same spark cluster, it 
 will get NODE_LOCAL and RACK_LOCAL level tasks during loading stage, if the 
 tasks in loading stage have same locality level, ether NODE_LOCAL or 
 RACK_LOCAL it works fine.
 But if the tasks in loading stage get mixed locality level, such as 3 
 NODE_LOCAL tasks, and 2 RACK_LOCAL tasks, then the TaskSetManager of loading 
 stage will submit the 3 NODE_LOCAL tasks as soon as resources were offered, 
 then wait for spark.locality.wait.node, which was set to 30 minutes, the 2 
 RACK_LOCAL tasks will wait 30 minutes even though resources are available.
 Fixing this is quite tricky -- do we need to track the locality level 
 individually for each task?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5524) Remove messy dependencies to log4j

2015-02-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311121#comment-14311121
 ] 

Nicholas Chammas commented on SPARK-5524:
-

Oh my bad. Thanks for the correction.

 Remove messy dependencies to log4j
 --

 Key: SPARK-5524
 URL: https://issues.apache.org/jira/browse/SPARK-5524
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Jacek Lewandowski

 There are some tickets regarding loosening the dependency on Log4j, however 
 some classes still use the following scheme:
 {code}
   if (Logger.getLogger(classOf[SomeClass]).getLevel == null) {
 Logger.getLogger(classOf[SomeClass]).setLevel(someLevel)
   }
 {code}
 This doesn't look good and make it difficult to track why some logs are 
 missing when you use Log4j and why they are flooding when you use something 
 else, like logback. 
 There is a Logging class which checks whether we use Log4j or not. Why not 
 delegate all of such invocations, where the Logging class could handle it 
 properly, maybe considering more logging implementations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core

2015-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311022#comment-14311022
 ] 

Sean Owen commented on SPARK-5625:
--

The assembly jar is not extracted. It's a jar file like any other. It contains 
the core classes, as you can see with {{jar tf}}. Have you tried that? The 
binary distribution does not contain individual module artifacts. Those are 
published in Maven, since by themselves, they are only relevant as Maven 
artifacts. They are put together into an assembly for binary distributions. 
This is the thing you would use when actually deploying Spark on a cluster.

 Spark binaries do not incude Spark Core
 ---

 Key: SPARK-5625
 URL: https://issues.apache.org/jira/browse/SPARK-5625
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: CDH4
Reporter: DeepakVohra

 Spark binaries for CDH 4 do not include the Spark Core Jar. 
 http://spark.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core

2015-02-07 Thread DeepakVohra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311029#comment-14311029
 ] 

DeepakVohra commented on SPARK-5625:


Thanks, yes the assembly jar has the Spark artifact classes. Shall re-test as 
to why the Spark classes are not getting found when a .scala file is compiled 
even though the spark-1.2.0-bin-cdh4/lib/* is in the classpath. 

 Spark binaries do not incude Spark Core
 ---

 Key: SPARK-5625
 URL: https://issues.apache.org/jira/browse/SPARK-5625
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: CDH4
Reporter: DeepakVohra

 Spark binaries for CDH 4 do not include the Spark Core Jar. 
 http://spark.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4808) Spark fails to spill with small number of large objects

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-4808:

Component/s: Spark Core

 Spark fails to spill with small number of large objects
 ---

 Key: SPARK-4808
 URL: https://issues.apache.org/jira/browse/SPARK-4808
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0, 1.2.0, 1.2.1
Reporter: Dennis Lawler

 Spillable's maybeSpill does not allow spill to occur until at least 1000 
 elements have been spilled, and then will only evaluate spill every 32nd 
 element thereafter.  When there is a small number of very large items being 
 tracked, out-of-memory conditions may occur.
 I suspect that this and the every-32nd-element behavior was to reduce the 
 impact of the estimateSize() call.  This method was extracted into 
 SizeTracker, which implements its own exponential backup for size estimation, 
 so now we are only avoiding using the resulting estimated size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5363) Spark 1.2 freeze without error notification

2015-02-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311050#comment-14311050
 ] 

Nicholas Chammas commented on SPARK-5363:
-

[~TJKlein] - Can you provide more information about the environment in which 
you see this error? Can you also come up with a simple repro script?

 Spark 1.2 freeze without error notification
 ---

 Key: SPARK-5363
 URL: https://issues.apache.org/jira/browse/SPARK-5363
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
Reporter: Tassilo Klein
Assignee: Davies Liu
Priority: Critical

 After a number of calls to a map().collect() statement Spark freezes without 
 reporting any error.  Within the map a large broadcast variable is used.
 The freezing can be avoided by setting 'spark.python.worker.reuse = false' 
 (Spark 1.2) or using an earlier version, however, at the prize of low speed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5628) Add option to return spark-ec2 version

2015-02-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311055#comment-14311055
 ] 

Nicholas Chammas commented on SPARK-5628:
-

We still need a backport to 1.2.2 for this issue.

 Add option to return spark-ec2 version
 --

 Key: SPARK-5628
 URL: https://issues.apache.org/jira/browse/SPARK-5628
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
  Labels: backport-needed
 Fix For: 1.3.0, 1.2.2, 1.4.0


 We need a {{--version}} option for {{spark-ec2}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core

2015-02-07 Thread DeepakVohra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311073#comment-14311073
 ] 

DeepakVohra commented on SPARK-5625:


The spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar has too many classes, 
which may be causing classloading issue. The classes do not even get extracted 
with WinZip  and generate the following error.
 
Error: too many entries in central directory according to end of central 
directory info.

 Spark binaries do not incude Spark Core
 ---

 Key: SPARK-5625
 URL: https://issues.apache.org/jira/browse/SPARK-5625
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: CDH4
Reporter: DeepakVohra

 Spark binaries for CDH 4 do not include the Spark Core Jar. 
 http://spark.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize Scala/Java test execution

2015-02-07 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311135#comment-14311135
 ] 

Nicholas Chammas commented on SPARK-3431:
-

[~srowen] - Have you tried anything recently with parallelizing tests with 
Maven?

 Parallelize Scala/Java test execution
 -

 Key: SPARK-3431
 URL: https://issues.apache.org/jira/browse/SPARK-3431
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
 Attachments: SPARK-3431-srowen-attempt.patch


 Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
 strategy to cut test time down is to parallelize the execution of the tests. 
 Doing that may in turn require some prerequisite changes to be made to how 
 certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1967) Using parallelize method to create RDD, wordcount app just hanging there without errors or warnings

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1967.
--
Resolution: Cannot Reproduce

 Using parallelize method to create RDD, wordcount app just hanging there 
 without errors or warnings
 ---

 Key: SPARK-1967
 URL: https://issues.apache.org/jira/browse/SPARK-1967
 Project: Spark
  Issue Type: Bug
Affects Versions: 0.9.1
 Environment: Ubuntu-12.04, single machine spark standalone, 8 core, 
 8G mem, spark 0.9.1, java-1.7
Reporter: Min Li

 I was trying the parallelize method to create RDD. I used Java. And it's a 
 simple wordcount program, except that I first read the input into memory and 
 then use the parallelize method to create the RDD, rather than the default 
 textFile method in the given example. 
 Pseudo codes:
 JavaSparkContext ctx = new JavaSparkContext($SparkMasterURL, $NAME, 
 $SparkHome, $jars);
 ListString input = #read lines from input file and form a ArrayListString
 JavaRDD lines = ctx.parallelize(input);
 //followed by wordcount
 above is not working.
 JavaRDD lines = ctx.textFile(file);
 //followed by wordcount
 this is working
 The log is:
 14/05/29 16:18:43 INFO Slf4jLogger: Slf4jLogger started
 14/05/29 16:18:43 INFO Remoting: Starting remoting
 14/05/29 16:18:43 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://spark@spark:55224]
 14/05/29 16:18:43 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://spark@spark:55224]
 14/05/29 16:18:43 INFO SparkEnv: Registering BlockManagerMaster
 14/05/29 16:18:43 INFO DiskBlockManager: Created local directory at 
 /tmp/spark-local-20140529161843-836a
 14/05/29 16:18:43 INFO MemoryStore: MemoryStore started with capacity 1056.0 
 MB.
 14/05/29 16:18:43 INFO ConnectionManager: Bound socket to port 42942 with id 
 = ConnectionManagerId(spark,42942)
 14/05/29 16:18:43 INFO BlockManagerMaster: Trying to register BlockManager
 14/05/29 16:18:43 INFO BlockManagerMasterActor$BlockManagerInfo: Registering 
 block manager spark:42942 with 1056.0 MB RAM
 14/05/29 16:18:43 INFO BlockManagerMaster: Registered BlockManager
 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
 14/05/29 16:18:43 INFO HttpBroadcast: Broadcast server started at 
 http://10.227.119.185:43522
 14/05/29 16:18:43 INFO SparkEnv: Registering MapOutputTracker
 14/05/29 16:18:43 INFO HttpFileServer: HTTP File server directory is 
 /tmp/spark-3704a621-789c-4d97-b1fc-9654236dba3e
 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
 14/05/29 16:18:43 INFO SparkUI: Started Spark Web UI at http://spark:4040
 14/05/29 16:18:44 INFO SparkContext: Added JAR 
 /home/maxmin/tmp/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar at 
 http://10.227.119.185:55286/jars/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar
  with timestamp 1401394724045
 14/05/29 16:18:44 INFO AppClient$ClientActor: Connecting to master 
 spark://spark:7077...
 14/05/29 16:18:44 INFO SparkDeploySchedulerBackend: Connected to Spark 
 cluster with app ID app-20140529161844-0001
 14/05/29 16:18:44 INFO AppClient$ClientActor: Executor added: 
 app-20140529161844-0001/0 on worker-20140529155406-spark-59658 (spark:59658) 
 with 8 cores
 The app is hanging here forever. And spark:8080 spark:4040 are not showing 
 any strange info. The Spark Stages page shows the Active Stages is 
 reduceByKey, tasks: Succeeded/Total is 0/2. I've also tried directly call 
 lines.count after parallelize, and the app will stuck at the count stage.
 I've also tried to use some static give string list and use the parallelize 
 to create rdd. This time, the app is still hanging but the stages show 
 nothing active. And the log is similar. 
 I used spark-0.9.1 and used default spark-env.sh. In the slaves file I have 
 only one host. I used maven to compile a fat jar with spark specified as 
 provided. I modified the run-example script to submit the jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-625) Client hangs when connecting to standalone cluster using wrong address

2015-02-07 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-625.
--
Resolution: Fixed

Let's resolve this as Fixed for now.  Reducing Akka's sensitivity to 
hostnames is a more general issue and we may have a fix for this in the future 
by either upgrading to a version of Akka that differentiates between bound and 
advertised addressed or by replacing Akka with a different communications 
layer.  I don't think we've observed the hang indefinitely behavior described 
in this ticket for many versions, so I think this should be safe to close.

 Client hangs when connecting to standalone cluster using wrong address
 --

 Key: SPARK-625
 URL: https://issues.apache.org/jira/browse/SPARK-625
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.7.0, 0.7.1, 0.8.0
Reporter: Josh Rosen
Priority: Minor

 I launched a standalone cluster on my laptop, connecting the workers to the 
 master using my machine's public IP address (128.32.*.*:7077).  If I try to 
 connect spark-shell to the master using spark://0.0.0.0:7077, it 
 successfully brings up a Scala prompt but hangs when I try to run a job.
 From the standalone master's log, it looks like the client's messages are 
 being dropped without the client discovering that the connection has failed:
 {code}
 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message 
 RegisterJob(JobDescription(Spark shell)) for non-local recipient 
 akka://spark@0.0.0.0:7077/user/Master at akka://spark@128.32.*.*:7077 local 
 is akka://spark@128.32.*.*:7077
 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message 
 DaemonMsgWatch(Actor[akka://spark@128.32.*.*:57518/user/$a],Actor[akka://spark@0.0.0.0:7077/user/Master])
  for non-local recipient akka://spark@0.0.0.0:7077/remote at 
 akka://spark@128.32.*.*:7077 local is akka://spark@128.32.*.*:7077
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3760) Add Twitter4j FilterQuery to spark streaming twitter API

2015-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311010#comment-14311010
 ] 

Sean Owen commented on SPARK-3760:
--

The PR was abandoned. Is this WontFix? It kind of overlaps with the 
functionality of SPARK-2788 which should still really make it over the line. 
Collectively does that provide enough functionality from this basic example?

 Add Twitter4j FilterQuery to spark streaming twitter API
 

 Key: SPARK-3760
 URL: https://issues.apache.org/jira/browse/SPARK-3760
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.1.0
Reporter: Eugene Zhulenev
Priority: Minor

 TwitterUtils.createStream(...) allows users to specify keywords that restrict 
 the tweets that are returned. However FilterQuery from Twitter4j has a bunch 
 of other options including location that was asked in SPARK-2788. Best 
 solution will be add alternative createStream method with FilterQuery as 
 argument instead of keywords.
 Pull Request: https://github.com/apache/spark/pull/2618



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311017#comment-14311017
 ] 

Apache Spark commented on SPARK-5671:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4454

 Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles
 -

 Key: SPARK-5671
 URL: https://issues.apache.org/jira/browse/SPARK-5671
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Josh Rosen
Assignee: Josh Rosen

 Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and 
 hadoop-2.4 profiles fixes a dependency conflict issue that was causing 
 UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN 
 builds.
 Jets3t release notes can be found here: 
 http://www.jets3t.org/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4383) Delay scheduling doesn't work right when jobs have tasks with different locality levels

2015-02-07 Thread Kay Ousterhout (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout resolved SPARK-4383.
---
   Resolution: Fixed
Fix Version/s: 1.3.0

 Delay scheduling doesn't work right when jobs have tasks with different 
 locality levels
 ---

 Key: SPARK-4383
 URL: https://issues.apache.org/jira/browse/SPARK-4383
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.0.2, 1.1.0
Reporter: Kay Ousterhout
 Fix For: 1.3.0


 Copied from mailing list discussion:
 Now our application will load data from hdfs in the same spark cluster, it 
 will get NODE_LOCAL and RACK_LOCAL level tasks during loading stage, if the 
 tasks in loading stage have same locality level, ether NODE_LOCAL or 
 RACK_LOCAL it works fine.
 But if the tasks in loading stage get mixed locality level, such as 3 
 NODE_LOCAL tasks, and 2 RACK_LOCAL tasks, then the TaskSetManager of loading 
 stage will submit the 3 NODE_LOCAL tasks as soon as resources were offered, 
 then wait for spark.locality.wait.node, which was set to 30 minutes, the 2 
 RACK_LOCAL tasks will wait 30 minutes even though resources are available.
 Fixing this is quite tricky -- do we need to track the locality level 
 individually for each task?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5668) spark_ec2.py region parameter could be either mandatory or its value displayed

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5668:

Labels: starter  (was: )

 spark_ec2.py region parameter could be either mandatory or its value displayed
 --

 Key: SPARK-5668
 URL: https://issues.apache.org/jira/browse/SPARK-5668
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Affects Versions: 1.2.0, 1.3.0, 1.4.0
Reporter: Miguel Peralvo
Priority: Minor
  Labels: starter

 If the region parameter is not specified when invoking spark-ec2 
 (spark-ec2.py behind the scenes) it defaults to us-east-1. When the cluster 
 doesn't belong to that region, after showing the Searching for existing 
 cluster Spark... message, it causes an ERROR: Could not find any existing 
 cluster exception because it doesn't find you cluster in the default region.
 As it doesn't tell you anything about the region, It can be a small headache 
 for new users.
 In 
 http://stackoverflow.com/questions/21171576/why-does-spark-ec2-fail-with-error-could-not-find-any-existing-cluster,
  Dmitriy Selivanov explains it.
 I propose that:
 1. Either we make the search message a little bit more informative with 
 something like Searching for existing cluster Spark in region  + 
 opts.region.
 2. Or we remove the us-east-1 as default and make the --region parameter 
 mandatory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5425) ConcurrentModificationException during SparkConf creation

2015-02-07 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-5425.
---
  Resolution: Fixed
   Fix Version/s: 1.2.2
Target Version/s:   (was: 1.2.2)

I've merged this into `branch-1.2` (1.2.2), completing the backports.

 ConcurrentModificationException during SparkConf creation
 -

 Key: SPARK-5425
 URL: https://issues.apache.org/jira/browse/SPARK-5425
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0
Reporter: Jacek Lewandowski
Assignee: Jacek Lewandowski
 Fix For: 1.3.0, 1.1.2, 1.2.2


 This fragment of code:
 {code}
   if (loadDefaults) {
 // Load any spark.* system properties
 for ((k, v) - System.getProperties.asScala if k.startsWith(spark.)) {
   settings(k) = v
 }
   }
 {code}
 causes 
 {noformat}
 ERROR 09:43:15  SparkMaster service caused error in state 
 STARTINGjava.util.ConcurrentModificationException: null
   at java.util.Hashtable$Enumerator.next(Hashtable.java:1167) 
 ~[na:1.7.0_60]
   at 
 scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$3.next(Wrappers.scala:458)
  ~[scala-library-2.10.4.jar:na]
   at 
 scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$3.next(Wrappers.scala:454)
  ~[scala-library-2.10.4.jar:na]
   at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
 ~[scala-library-2.10.4.jar:na]
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
 ~[scala-library-2.10.4.jar:na]
   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) 
 ~[scala-library-2.10.4.jar:na]
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54) 
 ~[scala-library-2.10.4.jar:na]
   at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
  ~[scala-library-2.10.4.jar:na]
   at org.apache.spark.SparkConf.init(SparkConf.scala:53) 
 ~[spark-core_2.10-1.2.1_dse-20150121.075638-2.jar:1.2.1_dse-SNAPSHOT]
   at org.apache.spark.SparkConf.init(SparkConf.scala:47) 
 ~[spark-core_2.10-1.2.1_dse-20150121.075638-2.jar:1.2.1_dse-SNAPSHOT]
 {noformat}
 when there is another thread which modifies system properties at the same 
 time. 
 This bug https://issues.scala-lang.org/browse/SI-7775 is somehow related to 
 the issue and shows that the problem has been also found elsewhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-985) Support Job Cancellation on Mesos Scheduler

2015-02-07 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-985.
--
   Resolution: Fixed
Fix Version/s: 1.1.1
   1.2.0

I'm pretty sure that this was resolved by SPARK-3597 in 1.1.1 and 1.2.0: now 
that MesosSchedulerBackend implements killTask, I think we now have support for 
job cancellation on Mesos.  I'm going to mark this as Resolved, but feel free 
to re-open if there's still work to be done.

 Support Job Cancellation on Mesos Scheduler
 ---

 Key: SPARK-985
 URL: https://issues.apache.org/jira/browse/SPARK-985
 Project: Spark
  Issue Type: Improvement
  Components: Mesos, Scheduler
Affects Versions: 0.9.0
Reporter: Josh Rosen
 Fix For: 1.2.0, 1.1.1


 https://github.com/apache/incubator-spark/pull/29 added job cancellation but 
 may still need support for Mesos scheduler backends:
 Quote: 
 {quote}
 This looks good except that MesosSchedulerBackend isn't yet calling Mesos's 
 killTask. Do you want to add that too or are you planning to push it till 
 later? I don't think it's a huge change.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles

2015-02-07 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-5671:
-

 Summary: Bump jets3t version from 0.9.0 to 0.9.3 in hadoop-2.3 and 
hadoop-2.4 profiles
 Key: SPARK-5671
 URL: https://issues.apache.org/jira/browse/SPARK-5671
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Josh Rosen
Assignee: Josh Rosen


Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and 
hadoop-2.4 profiles fixes a dependency conflict issue that was causing 
UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN 
builds.

Jets3t release notes can be found here: http://www.jets3t.org/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5363) Spark 1.2 freeze without error notification

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5363:

Component/s: PySpark

 Spark 1.2 freeze without error notification
 ---

 Key: SPARK-5363
 URL: https://issues.apache.org/jira/browse/SPARK-5363
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.2.0
Reporter: Tassilo Klein
Assignee: Davies Liu
Priority: Critical

 After a number of calls to a map().collect() statement Spark freezes without 
 reporting any error.  Within the map a large broadcast variable is used.
 The freezing can be avoided by setting 'spark.python.worker.reuse = false' 
 (Spark 1.2) or using an earlier version, however, at the prize of low speed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5175:

Component/s: (was: Spark Core)
 Streaming

 bug in updating counters when starting multiple workers/supervisors in 
 actor-based receiver
 ---

 Key: SPARK-5175
 URL: https://issues.apache.org/jira/browse/SPARK-5175
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.0
Reporter: Nan Zhu

 when starting multiple workers(ActorReceiver.scala), we didn't update the 
 counters in it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5259) Fix endless retry stage by add task equal() and hashcode() to avoid stage.pendingTasks not empty while stage map output is available

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5259:

Component/s: Spark Core

 Fix endless retry stage by add task equal() and hashcode() to avoid 
 stage.pendingTasks not empty while stage map output is available 
 -

 Key: SPARK-5259
 URL: https://issues.apache.org/jira/browse/SPARK-5259
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.1, 1.2.0
Reporter: SuYan

 1. while shuffle stage was retry, there may have 2 taskSet running. 
 we call the 2 taskSet:taskSet0.0, taskSet0.1, and we know, taskSet0.1 will 
 re-run taskSet0.0's un-complete task
 if taskSet0.0 was run all the task that the taskSet0.1 not complete yet but 
 covered the partitions.
 then stage is Available is true.
 {code}
   def isAvailable: Boolean = {
 if (!isShuffleMap) {
   true
 } else {
   numAvailableOutputs == numPartitions
 }
   } 
 {code}
 but stage.pending task is not empty, to protect register mapStatus in 
 mapOutputTracker.
 because if task is complete success, pendingTasks is minus Task in 
 reference-level because the task is not override hashcode() and equals()
 pendingTask -= task
 but numAvailableOutputs is according to partitionID.
 here is the testcase to prove:
 {code}
   test(Make sure mapStage.pendingtasks is set()  +
 while MapStage.isAvailable is true while stage was retry ) {
 val firstRDD = new MyRDD(sc, 6, Nil)
 val firstShuffleDep = new ShuffleDependency(firstRDD, null)
 val firstShuyffleId = firstShuffleDep.shuffleId
 val shuffleMapRdd = new MyRDD(sc, 6, List(firstShuffleDep))
 val shuffleDep = new ShuffleDependency(shuffleMapRdd, null)
 val shuffleId = shuffleDep.shuffleId
 val reduceRdd = new MyRDD(sc, 2, List(shuffleDep))
 submit(reduceRdd, Array(0, 1))
 complete(taskSets(0), Seq(
   (Success, makeMapStatus(hostB, 1)),
   (Success, makeMapStatus(hostB, 2)),
   (Success, makeMapStatus(hostC, 3)),
   (Success, makeMapStatus(hostB, 4)),
   (Success, makeMapStatus(hostB, 5)),
   (Success, makeMapStatus(hostC, 6))
 ))
 complete(taskSets(1), Seq(
   (Success, makeMapStatus(hostA, 1)),
   (Success, makeMapStatus(hostB, 2)),
   (Success, makeMapStatus(hostA, 1)),
   (Success, makeMapStatus(hostB, 2)),
   (Success, makeMapStatus(hostA, 1))
 ))
 runEvent(ExecutorLost(exec-hostA))
 runEvent(CompletionEvent(taskSets(1).tasks(0), Resubmitted, null, null, 
 null, null))
 runEvent(CompletionEvent(taskSets(1).tasks(2), Resubmitted, null, null, 
 null, null))
 runEvent(CompletionEvent(taskSets(1).tasks(0),
   FetchFailed(null, firstShuyffleId, -1, 0, Fetch Mata data failed),
   null, null, null, null))
 scheduler.resubmitFailedStages()
 runEvent(CompletionEvent(taskSets(1).tasks(0), Success,
   makeMapStatus(hostC, 1), null, null, null))
 runEvent(CompletionEvent(taskSets(1).tasks(2), Success,
   makeMapStatus(hostC, 1), null, null, null))
 runEvent(CompletionEvent(taskSets(1).tasks(4), Success,
   makeMapStatus(hostC, 1), null, null, null))
 runEvent(CompletionEvent(taskSets(1).tasks(5), Success,
   makeMapStatus(hostB, 2), null, null, null))
 val stage = scheduler.stageIdToStage(taskSets(1).stageId)
 assert(stage.attemptId == 2)
 assert(stage.isAvailable)
 assert(stage.pendingTasks.size == 0)
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5175:

Component/s: Spark Core

 bug in updating counters when starting multiple workers/supervisors in 
 actor-based receiver
 ---

 Key: SPARK-5175
 URL: https://issues.apache.org/jira/browse/SPARK-5175
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Nan Zhu

 when starting multiple workers(ActorReceiver.scala), we didn't update the 
 counters in it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-5524) Remove messy dependencies to log4j

2015-02-07 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5524:

Comment: was deleted

(was: Oh my bad. Thanks for the correction.)

 Remove messy dependencies to log4j
 --

 Key: SPARK-5524
 URL: https://issues.apache.org/jira/browse/SPARK-5524
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Reporter: Jacek Lewandowski

 There are some tickets regarding loosening the dependency on Log4j, however 
 some classes still use the following scheme:
 {code}
   if (Logger.getLogger(classOf[SomeClass]).getLevel == null) {
 Logger.getLogger(classOf[SomeClass]).setLevel(someLevel)
   }
 {code}
 This doesn't look good and make it difficult to track why some logs are 
 missing when you use Log4j and why they are flooding when you use something 
 else, like logback. 
 There is a Logging class which checks whether we use Log4j or not. Why not 
 delegate all of such invocations, where the Logging class could handle it 
 properly, maybe considering more logging implementations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5673) Implement Streaming wrapper for all linear methos

2015-02-07 Thread Kirill A. Korinskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill A. Korinskiy updated SPARK-5673:
---
Description: 
Now spark had streaming wrapper for Logistic and Linear regressions only.

So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming 
fashion more useful.

  was:
Now spark had only streaming wrapper for Logistic and Linear regressions only.

So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming 
fashion more useful.


 Implement Streaming wrapper for all linear methos
 -

 Key: SPARK-5673
 URL: https://issues.apache.org/jira/browse/SPARK-5673
 Project: Spark
  Issue Type: New Feature
Reporter: Kirill A. Korinskiy

 Now spark had streaming wrapper for Logistic and Linear regressions only.
 So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming 
 fashion more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2