[jira] [Resolved] (SPARK-5108) Need to make jackson dependency version consistent with hadoop-2.6.0.
[ https://issues.apache.org/jira/browse/SPARK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5108. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 3938 [https://github.com/apache/spark/pull/3938] Need to make jackson dependency version consistent with hadoop-2.6.0. - Key: SPARK-5108 URL: https://issues.apache.org/jira/browse/SPARK-5108 Project: Spark Issue Type: Bug Components: Build Reporter: Zhan Zhang Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5108) Need to make jackson dependency version consistent with hadoop-2.6.0.
[ https://issues.apache.org/jira/browse/SPARK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5108: - Assignee: Zhan Zhang Need to make jackson dependency version consistent with hadoop-2.6.0. - Key: SPARK-5108 URL: https://issues.apache.org/jira/browse/SPARK-5108 Project: Spark Issue Type: Bug Components: Build Reporter: Zhan Zhang Assignee: Zhan Zhang Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS
Sean Owen created SPARK-5669: Summary: Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS Key: SPARK-5669 URL: https://issues.apache.org/jira/browse/SPARK-5669 Project: Spark Issue Type: Bug Components: Build Reporter: Sean Owen Priority: Blocker Fix For: 1.3.0 Sorry for Blocker, but it's a license issue. The Spark assembly includes the following, from JBLAS: {code} lib/ lib/static/ lib/static/Mac OS X/ lib/static/Mac OS X/x86_64/ lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib lib/static/Mac OS X/x86_64/sse3/ lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib lib/static/Windows/ lib/static/Windows/x86/ lib/static/Windows/x86/libgfortran-3.dll lib/static/Windows/x86/libgcc_s_dw2-1.dll lib/static/Windows/x86/jblas_arch_flavor.dll lib/static/Windows/x86/sse3/ lib/static/Windows/x86/sse3/jblas.dll lib/static/Windows/amd64/ lib/static/Windows/amd64/libgfortran-3.dll lib/static/Windows/amd64/jblas.dll lib/static/Windows/amd64/libgcc_s_sjlj-1.dll lib/static/Windows/amd64/jblas_arch_flavor.dll lib/static/Linux/ lib/static/Linux/i386/ lib/static/Linux/i386/sse3/ lib/static/Linux/i386/sse3/libjblas.so lib/static/Linux/i386/libjblas_arch_flavor.so lib/static/Linux/amd64/ lib/static/Linux/amd64/sse3/ lib/static/Linux/amd64/sse3/libjblas.so lib/static/Linux/amd64/libjblas_arch_flavor.so {code} Unfortunately the libgfortran and libgcc libraries included for Windows are not licensed in a way that's compatible with Spark and the AL2 -- LGPL at least. It's easy to exclude them. I'm not clear what it does to running on Windows; I assume it can still work but the libs would have to be made available locally and put on the shared library path manually. I don't think there's a package manager as in Linux that would make it easily available. I'm not able to test on Windows. If it doesn't work, the follow-up question is whether that means JBLAS has to be removed on the double, or treated as a known issue for 1.3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2451) Enable to load config file for Akka
[ https://issues.apache.org/jira/browse/SPARK-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-2451: - Component/s: Spark Core Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Enable to load config file for Akka --- Key: SPARK-2451 URL: https://issues.apache.org/jira/browse/SPARK-2451 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Kousuke Saruta Priority: Minor In current implementation, we cannot let Akka to load config file. Sometimes we want to use custom config file for Akka. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4617) Fix spark.yarn.applicationMaster.waitTries doc
[ https://issues.apache.org/jira/browse/SPARK-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4617: - Priority: Minor (was: Major) Is the change here to remove the doc for this property? the current code says that this config is deprecated. Fix spark.yarn.applicationMaster.waitTries doc -- Key: SPARK-4617 URL: https://issues.apache.org/jira/browse/SPARK-4617 Project: Spark Issue Type: Bug Components: Documentation, YARN Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1061) allow Hadoop RDDs to be read w/ a partitioner
[ https://issues.apache.org/jira/browse/SPARK-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-1061: Component/s: Spark Core allow Hadoop RDDs to be read w/ a partitioner - Key: SPARK-1061 URL: https://issues.apache.org/jira/browse/SPARK-1061 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Imran Rashid Assignee: Imran Rashid Using partitioners to get narrow dependencies can save tons of time on a shuffle. However, after saving an RDD to hdfs, and then reloading it, all partitioner information is lost. This means that you can never get a narrow dependency when loading data from hadoop. I think we could get around this by: 1) having a modified version of hadoop rdd that kept track of original part file (or maybe just prevent splits altogether ...) 2) add a assumePartition(partitioner:Partitioner, verify: Boolean) function to RDD. It would create a new RDD, which had the exact same data but just pretended that the RDD had the given partitioner applied to it. And if verify=true, it could add a mapPartitionsWithIndex to check that each record was in the right partition. http://apache-spark-user-list.1001560.n3.nabble.com/setting-partitioners-with-hadoop-rdds-td976.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5664) Restore stty settings when exiting for launching spark-shell from SBT
[ https://issues.apache.org/jira/browse/SPARK-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5664: Component/s: Build Restore stty settings when exiting for launching spark-shell from SBT - Key: SPARK-5664 URL: https://issues.apache.org/jira/browse/SPARK-5664 Project: Spark Issue Type: Bug Components: Build Reporter: Liang-Chi Hsieh -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4820) Spark build encounters File name too long on some encrypted filesystems
[ https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311061#comment-14311061 ] Jian Zhou commented on SPARK-4820: -- Encountered this issue in encfs, and this workaround works. Spark build encounters File name too long on some encrypted filesystems - Key: SPARK-4820 URL: https://issues.apache.org/jira/browse/SPARK-4820 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell This was reported by Luchesar Cekov on github along with a proposed fix. The fix has some potential downstream issues (it will modify the classnames) so until we understand better how many users are affected we aren't going to merge it. However, I'd like to include the issue and workaround here. If you encounter this issue please comment on the JIRA so we can assess the frequency. The issue produces this error: {code} [error] == Expanded type of tree == [error] [error] ConstantType(value = Constant(Throwable)) [error] [error] uncaught exception during compilation: java.io.IOException [error] File name too long [error] two errors found {code} The workaround is in maven under the compile options add: {code} + arg-Xmax-classfile-name/arg + arg128/arg {code} In SBT add: {code} +scalacOptions in Compile ++= Seq(-Xmax-classfile-name, 128), {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5524: --- Component/s: Spark Core Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1430#comment-1430 ] Patrick Wendell commented on SPARK-5524: [~nchammas] I don't think this is related to the build, so I've changed the component. Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5524: --- Component/s: (was: Build) Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5673) Implement Streaming wrapper for all linear methos
Kirill A. Korinskiy created SPARK-5673: -- Summary: Implement Streaming wrapper for all linear methos Key: SPARK-5673 URL: https://issues.apache.org/jira/browse/SPARK-5673 Project: Spark Issue Type: New Feature Reporter: Kirill A. Korinskiy Now spark had only streaming wrapper for Logistic and Linear regressions only. So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming fashion more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4647) yarn-client mode reports success even though job fails
[ https://issues.apache.org/jira/browse/SPARK-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4647. -- Resolution: Duplicate Also duplicates SPARK-3293 yarn-client mode reports success even though job fails -- Key: SPARK-4647 URL: https://issues.apache.org/jira/browse/SPARK-4647 Project: Spark Issue Type: Bug Components: YARN Reporter: SaintBacchus yarn's web show SUCCEEDED when the driver throw a exception in yarn-client -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5670) Spark artifacts compiled with Hadoop 1.x
[ https://issues.apache.org/jira/browse/SPARK-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5670. -- Resolution: Not a Problem This is another question that should be asked at user@, please. The artifacts published to Maven can only be compiled against one version of anything. Well, you can make a bunch of different artifacts with different {{classifier}}s, but, here the idea is that it doesn't matter: you are always compiling against these artifacts as an API, and never relying on them for their transitive Hadoop dependency. You mark these dependencies as provided in your app, and when executed on a cluster, they are using the correct dependencies for that cluster or something. Your error suggests that you have actually bundled old Hadoop code into your application. Don't do that; use provided scope. Spark artifacts compiled with Hadoop 1.x Key: SPARK-5670 URL: https://issues.apache.org/jira/browse/SPARK-5670 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: Spark 1.2 Reporter: DeepakVohra Why are Spark artifacts available from Maven compiled with Hadoop 1.x while the Spark binaries for Hadoop 1.x are not available? Also CDH is not available for Hadoop 1.x. Using Hadoop 2.0.0 or Hadoop 2.3 with Spark artifacts generates error such as the following. Server IPC version 7 cannot communicate with client version 4 Server IPC version 9 cannot communicate with client version 4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3760) Add Twitter4j FilterQuery to spark streaming twitter API
[ https://issues.apache.org/jira/browse/SPARK-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Zhulenev closed SPARK-3760. -- Resolution: Won't Fix Add Twitter4j FilterQuery to spark streaming twitter API Key: SPARK-3760 URL: https://issues.apache.org/jira/browse/SPARK-3760 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.1.0 Reporter: Eugene Zhulenev Priority: Minor TwitterUtils.createStream(...) allows users to specify keywords that restrict the tweets that are returned. However FilterQuery from Twitter4j has a bunch of other options including location that was asked in SPARK-2788. Best solution will be add alternative createStream method with FilterQuery as argument instead of keywords. Pull Request: https://github.com/apache/spark/pull/2618 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5080) Expose more cluster resource information to user
[ https://issues.apache.org/jira/browse/SPARK-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5080: Component/s: Spark Core Expose more cluster resource information to user Key: SPARK-5080 URL: https://issues.apache.org/jira/browse/SPARK-5080 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Rui Li It'll be useful if user can get detailed cluster resource info, e.g. granted/allocated executors, memory and CPU. Such information is available via WebUI but seems SparkContext doesn't have these APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5524: Component/s: Build Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Build Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-5671. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4454 [https://github.com/apache/spark/pull/4454] Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5671 URL: https://issues.apache.org/jira/browse/SPARK-5671 Project: Spark Issue Type: Improvement Components: Build Reporter: Josh Rosen Assignee: Josh Rosen Fix For: 1.3.0 Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and hadoop-2.4 profiles fixes a dependency conflict issue that was causing UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN builds. Jets3t release notes can be found here: http://www.jets3t.org/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5156) Priority queue for cross application scheduling
[ https://issues.apache.org/jira/browse/SPARK-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5156: Component/s: Scheduler Priority queue for cross application scheduling --- Key: SPARK-5156 URL: https://issues.apache.org/jira/browse/SPARK-5156 Project: Spark Issue Type: Wish Components: Scheduler Reporter: Timothy Wilder Priority: Minor FIFO is useful, but for many use cases, something more fine-grained would be excellent. If possible, I would love to see an optional priority queue for cross application scheduling. The gist of this would be that applications could be submitted with a priority, and the highest priority application would be executed first. A means to do crud operations on the queue would also be fantastic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311122#comment-14311122 ] Nicholas Chammas commented on SPARK-5524: - Oh my bad. Thanks for the correction. Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5669) Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS
[ https://issues.apache.org/jira/browse/SPARK-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310931#comment-14310931 ] Apache Spark commented on SPARK-5669: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/4453 Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS Key: SPARK-5669 URL: https://issues.apache.org/jira/browse/SPARK-5669 Project: Spark Issue Type: Bug Components: Build Reporter: Sean Owen Priority: Blocker Fix For: 1.3.0 Sorry for Blocker, but it's a license issue. The Spark assembly includes the following, from JBLAS: {code} lib/ lib/static/ lib/static/Mac OS X/ lib/static/Mac OS X/x86_64/ lib/static/Mac OS X/x86_64/libjblas_arch_flavor.jnilib lib/static/Mac OS X/x86_64/sse3/ lib/static/Mac OS X/x86_64/sse3/libjblas.jnilib lib/static/Windows/ lib/static/Windows/x86/ lib/static/Windows/x86/libgfortran-3.dll lib/static/Windows/x86/libgcc_s_dw2-1.dll lib/static/Windows/x86/jblas_arch_flavor.dll lib/static/Windows/x86/sse3/ lib/static/Windows/x86/sse3/jblas.dll lib/static/Windows/amd64/ lib/static/Windows/amd64/libgfortran-3.dll lib/static/Windows/amd64/jblas.dll lib/static/Windows/amd64/libgcc_s_sjlj-1.dll lib/static/Windows/amd64/jblas_arch_flavor.dll lib/static/Linux/ lib/static/Linux/i386/ lib/static/Linux/i386/sse3/ lib/static/Linux/i386/sse3/libjblas.so lib/static/Linux/i386/libjblas_arch_flavor.so lib/static/Linux/amd64/ lib/static/Linux/amd64/sse3/ lib/static/Linux/amd64/sse3/libjblas.so lib/static/Linux/amd64/libjblas_arch_flavor.so {code} Unfortunately the libgfortran and libgcc libraries included for Windows are not licensed in a way that's compatible with Spark and the AL2 -- LGPL at least. It's easy to exclude them. I'm not clear what it does to running on Windows; I assume it can still work but the libs would have to be made available locally and put on the shared library path manually. I don't think there's a package manager as in Linux that would make it easily available. I'm not able to test on Windows. If it doesn't work, the follow-up question is whether that means JBLAS has to be removed on the double, or treated as a known issue for 1.3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5626) Spurious test failures due to NullPointerException in EasyMock test code
[ https://issues.apache.org/jira/browse/SPARK-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5626: - Component/s: Tests Spurious test failures due to NullPointerException in EasyMock test code Key: SPARK-5626 URL: https://issues.apache.org/jira/browse/SPARK-5626 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.3.0 Reporter: Josh Rosen Labels: flaky-test Attachments: consoleText.txt I've seen a few cases where a test failure will trigger a cascade of spurious failures when instantiating test suites that use EasyMock. Here's a sample symptom: {code} [info] CacheManagerSuite: [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.CacheManagerSuite *** ABORTED *** (137 milliseconds) [info] java.lang.NullPointerException: [info] at org.objenesis.strategy.StdInstantiatorStrategy.newInstantiatorOf(StdInstantiatorStrategy.java:52) [info] at org.objenesis.ObjenesisBase.getInstantiatorOf(ObjenesisBase.java:90) [info] at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) [info] at org.objenesis.ObjenesisHelper.newInstance(ObjenesisHelper.java:43) [info] at org.easymock.internal.ObjenesisClassInstantiator.newInstance(ObjenesisClassInstantiator.java:26) [info] at org.easymock.internal.ClassProxyFactory.createProxy(ClassProxyFactory.java:219) [info] at org.easymock.internal.MocksControl.createMock(MocksControl.java:59) [info] at org.easymock.EasyMock.createMock(EasyMock.java:103) [info] at org.scalatest.mock.EasyMockSugar$class.mock(EasyMockSugar.scala:267) [info] at org.apache.spark.CacheManagerSuite.mock(CacheManagerSuite.scala:28) [info] at org.apache.spark.CacheManagerSuite$$anonfun$1.apply$mcV$sp(CacheManagerSuite.scala:40) [info] at org.apache.spark.CacheManagerSuite$$anonfun$1.apply(CacheManagerSuite.scala:38) [info] at org.apache.spark.CacheManagerSuite$$anonfun$1.apply(CacheManagerSuite.scala:38) [info] at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:195) [info] at org.apache.spark.CacheManagerSuite.runTest(CacheManagerSuite.scala:28) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.apache.spark.CacheManagerSuite.org$scalatest$BeforeAndAfter$$super$run(CacheManagerSuite.scala:28) [info] at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) [info] at org.apache.spark.CacheManagerSuite.run(CacheManagerSuite.scala:28) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [info] at java.lang.Thread.run(Thread.java:745) {code} This is from https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26852/consoleFull. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4442) Move common unit test utilities into their own package / module
[ https://issues.apache.org/jira/browse/SPARK-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4442: - Component/s: Tests Move common unit test utilities into their own package / module --- Key: SPARK-4442 URL: https://issues.apache.org/jira/browse/SPARK-4442 Project: Spark Issue Type: Improvement Components: Tests Reporter: Josh Rosen Priority: Minor We should move generally-useful unit test fixtures / utility methods to their own test utilities set package / module to make them easier to find / use. See https://github.com/apache/spark/pull/3121#discussion-diff-20413659 for one example of this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4424) Clean up all SparkContexts in unit tests so that spark.driver.allowMultipleContexts can be false
[ https://issues.apache.org/jira/browse/SPARK-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4424: - Component/s: Tests Clean up all SparkContexts in unit tests so that spark.driver.allowMultipleContexts can be false Key: SPARK-4424 URL: https://issues.apache.org/jira/browse/SPARK-4424 Project: Spark Issue Type: Improvement Components: Tests Reporter: Josh Rosen Priority: Minor This is a followup JIRA to SPARK-4180 to make sure that we finish refactoring the unit tests so that all SparkContexts are properly cleaned up; since the current tests don't perform proper cleanup, we currently need to set {{spark.driver.allowMultipleContexts=true}} in the test configuration. It may be best to do this as part of a larger refactoring / cleanup of our test code to use cleaner test fixture patterns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4746) integration tests should be separated from faster unit tests
[ https://issues.apache.org/jira/browse/SPARK-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4746: - Component/s: Tests integration tests should be separated from faster unit tests Key: SPARK-4746 URL: https://issues.apache.org/jira/browse/SPARK-4746 Project: Spark Issue Type: Bug Components: Tests Reporter: Imran Rashid Priority: Trivial Currently there isn't a good way for a developer to skip the longer integration tests. This can slow down local development. See http://apache-spark-developers-list.1001551.n3.nabble.com/Spurious-test-failures-testing-best-practices-td9560.html One option is to use scalatest's notion of test tags to tag all integration tests, so they could easily be skipped -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core
[ https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311012#comment-14311012 ] DeepakVohra commented on SPARK-5625: Spark artifacts are not in the Spark binaries/assembly. Spark binaries do not incude Spark Core --- Key: SPARK-5625 URL: https://issues.apache.org/jira/browse/SPARK-5625 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: CDH4 Reporter: DeepakVohra Spark binaries for CDH 4 do not include the Spark Core Jar. http://spark.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5668) spark_ec2.py region parameter could be either mandatory or its value displayed
[ https://issues.apache.org/jira/browse/SPARK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311054#comment-14311054 ] Nicholas Chammas commented on SPARK-5668: - This sounds good to me, Miguel. I've been bitten by this before. I favor option 1. Workable defaults are generally convenient to have, so I wouldn't want to make {{--region}} mandatory. Also, that would break the tool for those who have built scripts that invoke {{spark-ec2}} without specifying the region. spark_ec2.py region parameter could be either mandatory or its value displayed -- Key: SPARK-5668 URL: https://issues.apache.org/jira/browse/SPARK-5668 Project: Spark Issue Type: Improvement Components: EC2 Affects Versions: 1.2.0, 1.3.0, 1.4.0 Reporter: Miguel Peralvo Priority: Minor If the region parameter is not specified when invoking spark-ec2 (spark-ec2.py behind the scenes) it defaults to us-east-1. When the cluster doesn't belong to that region, after showing the Searching for existing cluster Spark... message, it causes an ERROR: Could not find any existing cluster exception because it doesn't find you cluster in the default region. As it doesn't tell you anything about the region, It can be a small headache for new users. In http://stackoverflow.com/questions/21171576/why-does-spark-ec2-fail-with-error-could-not-find-any-existing-cluster, Dmitriy Selivanov explains it. I propose that: 1. Either we make the search message a little bit more informative with something like Searching for existing cluster Spark in region + opts.region. 2. Or we remove the us-east-1 as default and make the --region parameter mandatory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5672) Don't return `ERROR 500` when have missing args
[ https://issues.apache.org/jira/browse/SPARK-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311059#comment-14311059 ] Apache Spark commented on SPARK-5672: - User 'catap' has created a pull request for this issue: https://github.com/apache/spark/pull/4239 Don't return `ERROR 500` when have missing args --- Key: SPARK-5672 URL: https://issues.apache.org/jira/browse/SPARK-5672 Project: Spark Issue Type: Bug Components: Web UI Reporter: Kirill A. Korinskiy Spark web UI return HTTP ERROR 500 when GET arguments is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core
[ https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311076#comment-14311076 ] DeepakVohra commented on SPARK-5625: The spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar is not a valid archive. http://s763.photobucket.com/user/dvohra10/media/SparkAssembly_zps4319294c.jpg.html?o=0 The spark-1.2.0-bin-cdh4.tgz is downloaded from http://www.apache.org/dyn/closer.cgi/spark/spark-1.2.0/spark-1.2.0-bin-cdh4.tgz Spark binaries do not incude Spark Core --- Key: SPARK-5625 URL: https://issues.apache.org/jira/browse/SPARK-5625 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: CDH4 Reporter: DeepakVohra Spark binaries for CDH 4 do not include the Spark Core Jar. http://spark.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5053) Test maintenance branches on Jenkins using SBT
[ https://issues.apache.org/jira/browse/SPARK-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-5053. --- Resolution: Fixed Assignee: Josh Rosen I'm going to resolve this as fixed, since the scope of this JIRA was to create the maintenance SBT builds and to get them into a serviceable state. The current problems with some of those builds are outside the original scope of this JIRA and will be addressed separately. Test maintenance branches on Jenkins using SBT -- Key: SPARK-5053 URL: https://issues.apache.org/jira/browse/SPARK-5053 Project: Spark Issue Type: New Feature Components: Project Infra Reporter: Josh Rosen Assignee: Josh Rosen Priority: Blocker We need to create Jenkins jobs to test maintenance branches using SBT. The current Maven jobs for backport branches do not run the same checks that the pull request builder / SBT builds do (e.g. MiMa checks, PySpark, RAT, etc.) which means that cherry-picking backports can silently break things and we'll only discover it once PRs that are explicitly opened against those branches fail tests; this long delay between introducing test failures and detecting them is a huge productivity issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5672) Don't return `ERROR 500` when have missing args
Kirill A. Korinskiy created SPARK-5672: -- Summary: Don't return `ERROR 500` when have missing args Key: SPARK-5672 URL: https://issues.apache.org/jira/browse/SPARK-5672 Project: Spark Issue Type: Bug Components: Web UI Reporter: Kirill A. Korinskiy Spark web UI return HTTP ERROR 500 when GET arguments is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-578) Fix interpreter code generation to only capture needed dependencies
[ https://issues.apache.org/jira/browse/SPARK-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-578: Priority: Major Fix interpreter code generation to only capture needed dependencies --- Key: SPARK-578 URL: https://issues.apache.org/jira/browse/SPARK-578 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-540) Add API to customize in-memory representation of RDDs
[ https://issues.apache.org/jira/browse/SPARK-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-540: Priority: Minor Add API to customize in-memory representation of RDDs - Key: SPARK-540 URL: https://issues.apache.org/jira/browse/SPARK-540 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Matei Zaharia Priority: Minor Right now the choice between serialized caching and just Java objects in dev is fine, but it might be cool to also support structures such as column-oriented storage through arrays of primitives without forcing it through the serialization interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-573) Clarify semantics of the parallelized closures
[ https://issues.apache.org/jira/browse/SPARK-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-573: Priority: Minor Clarify semantics of the parallelized closures -- Key: SPARK-573 URL: https://issues.apache.org/jira/browse/SPARK-573 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: tjhunter Priority: Minor I do not think there is any guideline about which features of scala are allowed/forbidden in the closure that gets sent to the remote nodes. Two examples I have are a return statement and updating mutable variables of singletons. Ideally, a compiler plugin could give an error at compile time, but a good error message at run time would be good also. Are there any other cases that should not be allowed? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5191) Pyspark: scheduler hangs when importing a standalone pyspark app
[ https://issues.apache.org/jira/browse/SPARK-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-5191. --- Resolution: Not a Problem I'm going to resolve this as Not a Problem since the problem here lies with the user code and not Spark itself (we might be able to fix this, but we can't guarantee that invalid user programs will work correctly). Pyspark: scheduler hangs when importing a standalone pyspark app Key: SPARK-5191 URL: https://issues.apache.org/jira/browse/SPARK-5191 Project: Spark Issue Type: Bug Components: PySpark, Scheduler Affects Versions: 1.0.2, 1.1.1, 1.3.0, 1.2.1 Reporter: Daniel Liu In a.py: {code} from pyspark import SparkContext sc = SparkContext(local, test spark) rdd = sc.parallelize(range(1, 10)) print rdd.count() {code} In b.py: {code} from a import * {code} {{python a.py}} runs fine {{python b.py}} will hang at TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool {{./bin/spark-submit --py-files a.py b.py}} has the same problem -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5670) Spark artifacts compiled with Hadoop 1.x
DeepakVohra created SPARK-5670: -- Summary: Spark artifacts compiled with Hadoop 1.x Key: SPARK-5670 URL: https://issues.apache.org/jira/browse/SPARK-5670 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: Spark 1.2 Reporter: DeepakVohra Why are Spark artifacts available from Maven compiled with Hadoop 1.x while the Spark binaries for Hadoop 1.x are not available? Also CDH is not available for Hadoop 1.x. Using Hadoop 2.0.0 or Hadoop 2.3 with Spark artifacts generates error such as the following. Server IPC version 7 cannot communicate with client version 4 Server IPC version 9 cannot communicate with client version 4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5670) Spark artifacts compiled with Hadoop 1.x
[ https://issues.apache.org/jira/browse/SPARK-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311031#comment-14311031 ] DeepakVohra commented on SPARK-5670: Not using Maven to run the Spark application to be able to set provided scope. Running Spark application as local master URL. Spark artifacts compiled with Hadoop 1.x Key: SPARK-5670 URL: https://issues.apache.org/jira/browse/SPARK-5670 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: Spark 1.2 Reporter: DeepakVohra Why are Spark artifacts available from Maven compiled with Hadoop 1.x while the Spark binaries for Hadoop 1.x are not available? Also CDH is not available for Hadoop 1.x. Using Hadoop 2.0.0 or Hadoop 2.3 with Spark artifacts generates error such as the following. Server IPC version 7 cannot communicate with client version 4 Server IPC version 9 cannot communicate with client version 4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1142) Allow adding jars on app submission, outside of code
[ https://issues.apache.org/jira/browse/SPARK-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-1142: Component/s: Spark Submit Allow adding jars on app submission, outside of code Key: SPARK-1142 URL: https://issues.apache.org/jira/browse/SPARK-1142 Project: Spark Issue Type: Improvement Components: Spark Submit Affects Versions: 0.9.0 Reporter: Sandy Pérez González Assignee: Sandy Pérez González yarn-standalone mode supports an option that allows adding jars that will be distributed on the cluster with job submission. Providing similar functionality for other app submission modes will allow the spark-app script proposed in SPARK-1126 to support an add-jars option that works for every submit mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4383) Delay scheduling doesn't work right when jobs have tasks with different locality levels
[ https://issues.apache.org/jira/browse/SPARK-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-4383: Component/s: Scheduler Delay scheduling doesn't work right when jobs have tasks with different locality levels --- Key: SPARK-4383 URL: https://issues.apache.org/jira/browse/SPARK-4383 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.0.2, 1.1.0 Reporter: Kay Ousterhout Copied from mailing list discussion: Now our application will load data from hdfs in the same spark cluster, it will get NODE_LOCAL and RACK_LOCAL level tasks during loading stage, if the tasks in loading stage have same locality level, ether NODE_LOCAL or RACK_LOCAL it works fine. But if the tasks in loading stage get mixed locality level, such as 3 NODE_LOCAL tasks, and 2 RACK_LOCAL tasks, then the TaskSetManager of loading stage will submit the 3 NODE_LOCAL tasks as soon as resources were offered, then wait for spark.locality.wait.node, which was set to 30 minutes, the 2 RACK_LOCAL tasks will wait 30 minutes even though resources are available. Fixing this is quite tricky -- do we need to track the locality level individually for each task? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311121#comment-14311121 ] Nicholas Chammas commented on SPARK-5524: - Oh my bad. Thanks for the correction. Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core
[ https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311022#comment-14311022 ] Sean Owen commented on SPARK-5625: -- The assembly jar is not extracted. It's a jar file like any other. It contains the core classes, as you can see with {{jar tf}}. Have you tried that? The binary distribution does not contain individual module artifacts. Those are published in Maven, since by themselves, they are only relevant as Maven artifacts. They are put together into an assembly for binary distributions. This is the thing you would use when actually deploying Spark on a cluster. Spark binaries do not incude Spark Core --- Key: SPARK-5625 URL: https://issues.apache.org/jira/browse/SPARK-5625 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: CDH4 Reporter: DeepakVohra Spark binaries for CDH 4 do not include the Spark Core Jar. http://spark.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core
[ https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311029#comment-14311029 ] DeepakVohra commented on SPARK-5625: Thanks, yes the assembly jar has the Spark artifact classes. Shall re-test as to why the Spark classes are not getting found when a .scala file is compiled even though the spark-1.2.0-bin-cdh4/lib/* is in the classpath. Spark binaries do not incude Spark Core --- Key: SPARK-5625 URL: https://issues.apache.org/jira/browse/SPARK-5625 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: CDH4 Reporter: DeepakVohra Spark binaries for CDH 4 do not include the Spark Core Jar. http://spark.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4808) Spark fails to spill with small number of large objects
[ https://issues.apache.org/jira/browse/SPARK-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-4808: Component/s: Spark Core Spark fails to spill with small number of large objects --- Key: SPARK-4808 URL: https://issues.apache.org/jira/browse/SPARK-4808 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0, 1.2.0, 1.2.1 Reporter: Dennis Lawler Spillable's maybeSpill does not allow spill to occur until at least 1000 elements have been spilled, and then will only evaluate spill every 32nd element thereafter. When there is a small number of very large items being tracked, out-of-memory conditions may occur. I suspect that this and the every-32nd-element behavior was to reduce the impact of the estimateSize() call. This method was extracted into SizeTracker, which implements its own exponential backup for size estimation, so now we are only avoiding using the resulting estimated size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5363) Spark 1.2 freeze without error notification
[ https://issues.apache.org/jira/browse/SPARK-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311050#comment-14311050 ] Nicholas Chammas commented on SPARK-5363: - [~TJKlein] - Can you provide more information about the environment in which you see this error? Can you also come up with a simple repro script? Spark 1.2 freeze without error notification --- Key: SPARK-5363 URL: https://issues.apache.org/jira/browse/SPARK-5363 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Reporter: Tassilo Klein Assignee: Davies Liu Priority: Critical After a number of calls to a map().collect() statement Spark freezes without reporting any error. Within the map a large broadcast variable is used. The freezing can be avoided by setting 'spark.python.worker.reuse = false' (Spark 1.2) or using an earlier version, however, at the prize of low speed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5628) Add option to return spark-ec2 version
[ https://issues.apache.org/jira/browse/SPARK-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311055#comment-14311055 ] Nicholas Chammas commented on SPARK-5628: - We still need a backport to 1.2.2 for this issue. Add option to return spark-ec2 version -- Key: SPARK-5628 URL: https://issues.apache.org/jira/browse/SPARK-5628 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Nicholas Chammas Assignee: Nicholas Chammas Priority: Minor Labels: backport-needed Fix For: 1.3.0, 1.2.2, 1.4.0 We need a {{--version}} option for {{spark-ec2}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core
[ https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311073#comment-14311073 ] DeepakVohra commented on SPARK-5625: The spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar has too many classes, which may be causing classloading issue. The classes do not even get extracted with WinZip and generate the following error. Error: too many entries in central directory according to end of central directory info. Spark binaries do not incude Spark Core --- Key: SPARK-5625 URL: https://issues.apache.org/jira/browse/SPARK-5625 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.2.0 Environment: CDH4 Reporter: DeepakVohra Spark binaries for CDH 4 do not include the Spark Core Jar. http://spark.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize Scala/Java test execution
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311135#comment-14311135 ] Nicholas Chammas commented on SPARK-3431: - [~srowen] - Have you tried anything recently with parallelizing tests with Maven? Parallelize Scala/Java test execution - Key: SPARK-3431 URL: https://issues.apache.org/jira/browse/SPARK-3431 Project: Spark Issue Type: Improvement Components: Build Reporter: Nicholas Chammas Assignee: Nicholas Chammas Attachments: SPARK-3431-srowen-attempt.patch Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common strategy to cut test time down is to parallelize the execution of the tests. Doing that may in turn require some prerequisite changes to be made to how certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1967) Using parallelize method to create RDD, wordcount app just hanging there without errors or warnings
[ https://issues.apache.org/jira/browse/SPARK-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1967. -- Resolution: Cannot Reproduce Using parallelize method to create RDD, wordcount app just hanging there without errors or warnings --- Key: SPARK-1967 URL: https://issues.apache.org/jira/browse/SPARK-1967 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Environment: Ubuntu-12.04, single machine spark standalone, 8 core, 8G mem, spark 0.9.1, java-1.7 Reporter: Min Li I was trying the parallelize method to create RDD. I used Java. And it's a simple wordcount program, except that I first read the input into memory and then use the parallelize method to create the RDD, rather than the default textFile method in the given example. Pseudo codes: JavaSparkContext ctx = new JavaSparkContext($SparkMasterURL, $NAME, $SparkHome, $jars); ListString input = #read lines from input file and form a ArrayListString JavaRDD lines = ctx.parallelize(input); //followed by wordcount above is not working. JavaRDD lines = ctx.textFile(file); //followed by wordcount this is working The log is: 14/05/29 16:18:43 INFO Slf4jLogger: Slf4jLogger started 14/05/29 16:18:43 INFO Remoting: Starting remoting 14/05/29 16:18:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@spark:55224] 14/05/29 16:18:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@spark:55224] 14/05/29 16:18:43 INFO SparkEnv: Registering BlockManagerMaster 14/05/29 16:18:43 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140529161843-836a 14/05/29 16:18:43 INFO MemoryStore: MemoryStore started with capacity 1056.0 MB. 14/05/29 16:18:43 INFO ConnectionManager: Bound socket to port 42942 with id = ConnectionManagerId(spark,42942) 14/05/29 16:18:43 INFO BlockManagerMaster: Trying to register BlockManager 14/05/29 16:18:43 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager spark:42942 with 1056.0 MB RAM 14/05/29 16:18:43 INFO BlockManagerMaster: Registered BlockManager 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server 14/05/29 16:18:43 INFO HttpBroadcast: Broadcast server started at http://10.227.119.185:43522 14/05/29 16:18:43 INFO SparkEnv: Registering MapOutputTracker 14/05/29 16:18:43 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3704a621-789c-4d97-b1fc-9654236dba3e 14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server 14/05/29 16:18:43 INFO SparkUI: Started Spark Web UI at http://spark:4040 14/05/29 16:18:44 INFO SparkContext: Added JAR /home/maxmin/tmp/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar at http://10.227.119.185:55286/jars/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1401394724045 14/05/29 16:18:44 INFO AppClient$ClientActor: Connecting to master spark://spark:7077... 14/05/29 16:18:44 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140529161844-0001 14/05/29 16:18:44 INFO AppClient$ClientActor: Executor added: app-20140529161844-0001/0 on worker-20140529155406-spark-59658 (spark:59658) with 8 cores The app is hanging here forever. And spark:8080 spark:4040 are not showing any strange info. The Spark Stages page shows the Active Stages is reduceByKey, tasks: Succeeded/Total is 0/2. I've also tried directly call lines.count after parallelize, and the app will stuck at the count stage. I've also tried to use some static give string list and use the parallelize to create rdd. This time, the app is still hanging but the stages show nothing active. And the log is similar. I used spark-0.9.1 and used default spark-env.sh. In the slaves file I have only one host. I used maven to compile a fat jar with spark specified as provided. I modified the run-example script to submit the jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-625) Client hangs when connecting to standalone cluster using wrong address
[ https://issues.apache.org/jira/browse/SPARK-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-625. -- Resolution: Fixed Let's resolve this as Fixed for now. Reducing Akka's sensitivity to hostnames is a more general issue and we may have a fix for this in the future by either upgrading to a version of Akka that differentiates between bound and advertised addressed or by replacing Akka with a different communications layer. I don't think we've observed the hang indefinitely behavior described in this ticket for many versions, so I think this should be safe to close. Client hangs when connecting to standalone cluster using wrong address -- Key: SPARK-625 URL: https://issues.apache.org/jira/browse/SPARK-625 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.7.0, 0.7.1, 0.8.0 Reporter: Josh Rosen Priority: Minor I launched a standalone cluster on my laptop, connecting the workers to the master using my machine's public IP address (128.32.*.*:7077). If I try to connect spark-shell to the master using spark://0.0.0.0:7077, it successfully brings up a Scala prompt but hangs when I try to run a job. From the standalone master's log, it looks like the client's messages are being dropped without the client discovering that the connection has failed: {code} 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message RegisterJob(JobDescription(Spark shell)) for non-local recipient akka://spark@0.0.0.0:7077/user/Master at akka://spark@128.32.*.*:7077 local is akka://spark@128.32.*.*:7077 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message DaemonMsgWatch(Actor[akka://spark@128.32.*.*:57518/user/$a],Actor[akka://spark@0.0.0.0:7077/user/Master]) for non-local recipient akka://spark@0.0.0.0:7077/remote at akka://spark@128.32.*.*:7077 local is akka://spark@128.32.*.*:7077 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3760) Add Twitter4j FilterQuery to spark streaming twitter API
[ https://issues.apache.org/jira/browse/SPARK-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311010#comment-14311010 ] Sean Owen commented on SPARK-3760: -- The PR was abandoned. Is this WontFix? It kind of overlaps with the functionality of SPARK-2788 which should still really make it over the line. Collectively does that provide enough functionality from this basic example? Add Twitter4j FilterQuery to spark streaming twitter API Key: SPARK-3760 URL: https://issues.apache.org/jira/browse/SPARK-3760 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.1.0 Reporter: Eugene Zhulenev Priority: Minor TwitterUtils.createStream(...) allows users to specify keywords that restrict the tweets that are returned. However FilterQuery from Twitter4j has a bunch of other options including location that was asked in SPARK-2788. Best solution will be add alternative createStream method with FilterQuery as argument instead of keywords. Pull Request: https://github.com/apache/spark/pull/2618 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles
[ https://issues.apache.org/jira/browse/SPARK-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311017#comment-14311017 ] Apache Spark commented on SPARK-5671: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/4454 Bump jets3t version from 0.9.0 to 0.9.2 in hadoop-2.3 and hadoop-2.4 profiles - Key: SPARK-5671 URL: https://issues.apache.org/jira/browse/SPARK-5671 Project: Spark Issue Type: Improvement Components: Build Reporter: Josh Rosen Assignee: Josh Rosen Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and hadoop-2.4 profiles fixes a dependency conflict issue that was causing UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN builds. Jets3t release notes can be found here: http://www.jets3t.org/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4383) Delay scheduling doesn't work right when jobs have tasks with different locality levels
[ https://issues.apache.org/jira/browse/SPARK-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout resolved SPARK-4383. --- Resolution: Fixed Fix Version/s: 1.3.0 Delay scheduling doesn't work right when jobs have tasks with different locality levels --- Key: SPARK-4383 URL: https://issues.apache.org/jira/browse/SPARK-4383 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.0.2, 1.1.0 Reporter: Kay Ousterhout Fix For: 1.3.0 Copied from mailing list discussion: Now our application will load data from hdfs in the same spark cluster, it will get NODE_LOCAL and RACK_LOCAL level tasks during loading stage, if the tasks in loading stage have same locality level, ether NODE_LOCAL or RACK_LOCAL it works fine. But if the tasks in loading stage get mixed locality level, such as 3 NODE_LOCAL tasks, and 2 RACK_LOCAL tasks, then the TaskSetManager of loading stage will submit the 3 NODE_LOCAL tasks as soon as resources were offered, then wait for spark.locality.wait.node, which was set to 30 minutes, the 2 RACK_LOCAL tasks will wait 30 minutes even though resources are available. Fixing this is quite tricky -- do we need to track the locality level individually for each task? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5668) spark_ec2.py region parameter could be either mandatory or its value displayed
[ https://issues.apache.org/jira/browse/SPARK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5668: Labels: starter (was: ) spark_ec2.py region parameter could be either mandatory or its value displayed -- Key: SPARK-5668 URL: https://issues.apache.org/jira/browse/SPARK-5668 Project: Spark Issue Type: Improvement Components: EC2 Affects Versions: 1.2.0, 1.3.0, 1.4.0 Reporter: Miguel Peralvo Priority: Minor Labels: starter If the region parameter is not specified when invoking spark-ec2 (spark-ec2.py behind the scenes) it defaults to us-east-1. When the cluster doesn't belong to that region, after showing the Searching for existing cluster Spark... message, it causes an ERROR: Could not find any existing cluster exception because it doesn't find you cluster in the default region. As it doesn't tell you anything about the region, It can be a small headache for new users. In http://stackoverflow.com/questions/21171576/why-does-spark-ec2-fail-with-error-could-not-find-any-existing-cluster, Dmitriy Selivanov explains it. I propose that: 1. Either we make the search message a little bit more informative with something like Searching for existing cluster Spark in region + opts.region. 2. Or we remove the us-east-1 as default and make the --region parameter mandatory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5425) ConcurrentModificationException during SparkConf creation
[ https://issues.apache.org/jira/browse/SPARK-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-5425. --- Resolution: Fixed Fix Version/s: 1.2.2 Target Version/s: (was: 1.2.2) I've merged this into `branch-1.2` (1.2.2), completing the backports. ConcurrentModificationException during SparkConf creation - Key: SPARK-5425 URL: https://issues.apache.org/jira/browse/SPARK-5425 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.1, 1.2.0 Reporter: Jacek Lewandowski Assignee: Jacek Lewandowski Fix For: 1.3.0, 1.1.2, 1.2.2 This fragment of code: {code} if (loadDefaults) { // Load any spark.* system properties for ((k, v) - System.getProperties.asScala if k.startsWith(spark.)) { settings(k) = v } } {code} causes {noformat} ERROR 09:43:15 SparkMaster service caused error in state STARTINGjava.util.ConcurrentModificationException: null at java.util.Hashtable$Enumerator.next(Hashtable.java:1167) ~[na:1.7.0_60] at scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$3.next(Wrappers.scala:458) ~[scala-library-2.10.4.jar:na] at scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$3.next(Wrappers.scala:454) ~[scala-library-2.10.4.jar:na] at scala.collection.Iterator$class.foreach(Iterator.scala:727) ~[scala-library-2.10.4.jar:na] at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ~[scala-library-2.10.4.jar:na] at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) ~[scala-library-2.10.4.jar:na] at scala.collection.AbstractIterable.foreach(Iterable.scala:54) ~[scala-library-2.10.4.jar:na] at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) ~[scala-library-2.10.4.jar:na] at org.apache.spark.SparkConf.init(SparkConf.scala:53) ~[spark-core_2.10-1.2.1_dse-20150121.075638-2.jar:1.2.1_dse-SNAPSHOT] at org.apache.spark.SparkConf.init(SparkConf.scala:47) ~[spark-core_2.10-1.2.1_dse-20150121.075638-2.jar:1.2.1_dse-SNAPSHOT] {noformat} when there is another thread which modifies system properties at the same time. This bug https://issues.scala-lang.org/browse/SI-7775 is somehow related to the issue and shows that the problem has been also found elsewhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-985) Support Job Cancellation on Mesos Scheduler
[ https://issues.apache.org/jira/browse/SPARK-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-985. -- Resolution: Fixed Fix Version/s: 1.1.1 1.2.0 I'm pretty sure that this was resolved by SPARK-3597 in 1.1.1 and 1.2.0: now that MesosSchedulerBackend implements killTask, I think we now have support for job cancellation on Mesos. I'm going to mark this as Resolved, but feel free to re-open if there's still work to be done. Support Job Cancellation on Mesos Scheduler --- Key: SPARK-985 URL: https://issues.apache.org/jira/browse/SPARK-985 Project: Spark Issue Type: Improvement Components: Mesos, Scheduler Affects Versions: 0.9.0 Reporter: Josh Rosen Fix For: 1.2.0, 1.1.1 https://github.com/apache/incubator-spark/pull/29 added job cancellation but may still need support for Mesos scheduler backends: Quote: {quote} This looks good except that MesosSchedulerBackend isn't yet calling Mesos's killTask. Do you want to add that too or are you planning to push it till later? I don't think it's a huge change. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5671) Bump jets3t version from 0.9.0 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles
Josh Rosen created SPARK-5671: - Summary: Bump jets3t version from 0.9.0 to 0.9.3 in hadoop-2.3 and hadoop-2.4 profiles Key: SPARK-5671 URL: https://issues.apache.org/jira/browse/SPARK-5671 Project: Spark Issue Type: Improvement Components: Build Reporter: Josh Rosen Assignee: Josh Rosen Bumping the jets3t version from 0.9.0 to 0.9.2 for the hadoop-2.3 and hadoop-2.4 profiles fixes a dependency conflict issue that was causing UISeleniumSuite tests to fail with ClassNotFoundExceptions in the with YARN builds. Jets3t release notes can be found here: http://www.jets3t.org/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5363) Spark 1.2 freeze without error notification
[ https://issues.apache.org/jira/browse/SPARK-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5363: Component/s: PySpark Spark 1.2 freeze without error notification --- Key: SPARK-5363 URL: https://issues.apache.org/jira/browse/SPARK-5363 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Reporter: Tassilo Klein Assignee: Davies Liu Priority: Critical After a number of calls to a map().collect() statement Spark freezes without reporting any error. Within the map a large broadcast variable is used. The freezing can be avoided by setting 'spark.python.worker.reuse = false' (Spark 1.2) or using an earlier version, however, at the prize of low speed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver
[ https://issues.apache.org/jira/browse/SPARK-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5175: Component/s: (was: Spark Core) Streaming bug in updating counters when starting multiple workers/supervisors in actor-based receiver --- Key: SPARK-5175 URL: https://issues.apache.org/jira/browse/SPARK-5175 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.0 Reporter: Nan Zhu when starting multiple workers(ActorReceiver.scala), we didn't update the counters in it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5259) Fix endless retry stage by add task equal() and hashcode() to avoid stage.pendingTasks not empty while stage map output is available
[ https://issues.apache.org/jira/browse/SPARK-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5259: Component/s: Spark Core Fix endless retry stage by add task equal() and hashcode() to avoid stage.pendingTasks not empty while stage map output is available - Key: SPARK-5259 URL: https://issues.apache.org/jira/browse/SPARK-5259 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.1, 1.2.0 Reporter: SuYan 1. while shuffle stage was retry, there may have 2 taskSet running. we call the 2 taskSet:taskSet0.0, taskSet0.1, and we know, taskSet0.1 will re-run taskSet0.0's un-complete task if taskSet0.0 was run all the task that the taskSet0.1 not complete yet but covered the partitions. then stage is Available is true. {code} def isAvailable: Boolean = { if (!isShuffleMap) { true } else { numAvailableOutputs == numPartitions } } {code} but stage.pending task is not empty, to protect register mapStatus in mapOutputTracker. because if task is complete success, pendingTasks is minus Task in reference-level because the task is not override hashcode() and equals() pendingTask -= task but numAvailableOutputs is according to partitionID. here is the testcase to prove: {code} test(Make sure mapStage.pendingtasks is set() + while MapStage.isAvailable is true while stage was retry ) { val firstRDD = new MyRDD(sc, 6, Nil) val firstShuffleDep = new ShuffleDependency(firstRDD, null) val firstShuyffleId = firstShuffleDep.shuffleId val shuffleMapRdd = new MyRDD(sc, 6, List(firstShuffleDep)) val shuffleDep = new ShuffleDependency(shuffleMapRdd, null) val shuffleId = shuffleDep.shuffleId val reduceRdd = new MyRDD(sc, 2, List(shuffleDep)) submit(reduceRdd, Array(0, 1)) complete(taskSets(0), Seq( (Success, makeMapStatus(hostB, 1)), (Success, makeMapStatus(hostB, 2)), (Success, makeMapStatus(hostC, 3)), (Success, makeMapStatus(hostB, 4)), (Success, makeMapStatus(hostB, 5)), (Success, makeMapStatus(hostC, 6)) )) complete(taskSets(1), Seq( (Success, makeMapStatus(hostA, 1)), (Success, makeMapStatus(hostB, 2)), (Success, makeMapStatus(hostA, 1)), (Success, makeMapStatus(hostB, 2)), (Success, makeMapStatus(hostA, 1)) )) runEvent(ExecutorLost(exec-hostA)) runEvent(CompletionEvent(taskSets(1).tasks(0), Resubmitted, null, null, null, null)) runEvent(CompletionEvent(taskSets(1).tasks(2), Resubmitted, null, null, null, null)) runEvent(CompletionEvent(taskSets(1).tasks(0), FetchFailed(null, firstShuyffleId, -1, 0, Fetch Mata data failed), null, null, null, null)) scheduler.resubmitFailedStages() runEvent(CompletionEvent(taskSets(1).tasks(0), Success, makeMapStatus(hostC, 1), null, null, null)) runEvent(CompletionEvent(taskSets(1).tasks(2), Success, makeMapStatus(hostC, 1), null, null, null)) runEvent(CompletionEvent(taskSets(1).tasks(4), Success, makeMapStatus(hostC, 1), null, null, null)) runEvent(CompletionEvent(taskSets(1).tasks(5), Success, makeMapStatus(hostB, 2), null, null, null)) val stage = scheduler.stageIdToStage(taskSets(1).stageId) assert(stage.attemptId == 2) assert(stage.isAvailable) assert(stage.pendingTasks.size == 0) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5175) bug in updating counters when starting multiple workers/supervisors in actor-based receiver
[ https://issues.apache.org/jira/browse/SPARK-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5175: Component/s: Spark Core bug in updating counters when starting multiple workers/supervisors in actor-based receiver --- Key: SPARK-5175 URL: https://issues.apache.org/jira/browse/SPARK-5175 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Nan Zhu when starting multiple workers(ActorReceiver.scala), we didn't update the counters in it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5524: Comment: was deleted (was: Oh my bad. Thanks for the correction.) Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5673) Implement Streaming wrapper for all linear methos
[ https://issues.apache.org/jira/browse/SPARK-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill A. Korinskiy updated SPARK-5673: --- Description: Now spark had streaming wrapper for Logistic and Linear regressions only. So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming fashion more useful. was: Now spark had only streaming wrapper for Logistic and Linear regressions only. So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming fashion more useful. Implement Streaming wrapper for all linear methos - Key: SPARK-5673 URL: https://issues.apache.org/jira/browse/SPARK-5673 Project: Spark Issue Type: New Feature Reporter: Kirill A. Korinskiy Now spark had streaming wrapper for Logistic and Linear regressions only. So, implement wrapper for SVM, Lasso and Ridge Regression will make streaming fashion more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org