[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47620833 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47621874 I add some synchronized please see if it is thread safeï¼and Jenkins should test this once more --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1271#issuecomment-47621924 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16278/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47621967 Hey @YanjieGao, you need to resolve all those conflicts by hand first and then `git add` changed files. These two interactive online Git tutorial might be very helpful :-) 1. http://pcottle.github.io/learnGitBranching/ 1. https://try.github.io/levels/1/challenges/1 And thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1271#issuecomment-47621923 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Workaround in Spark for ConcurrentModification...
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1000#issuecomment-47622801 It seems that this workaround not works for me on Hadoop 2.2.0, I still hit into this problem from within the synchronized block with the latest trunk code: java.util.ConcurrentModificationException (java.util.ConcurrentModificationException} java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) java.util.HashMap$KeyIterator.next(HashMap.java:828) java.util.AbstractCollection.addAll(AbstractCollection.java:305) java.util.HashSet.init(HashSet.java:100) org.apache.hadoop.conf.Configuration.init(Configuration.java:554) org.apache.hadoop.mapred.JobConf.init(JobConf.java:439) org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:144) org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:189) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184) org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:261) org.apache.spark.rdd.RDD.iterator(RDD.scala:228) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:261) org.apache.spark.rdd.RDD.iterator(RDD.scala:228) org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:261) org.apache.spark.rdd.RDD.iterator(RDD.scala:228) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:261) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:59) org.apache.spark.rdd.RDD.iterator(RDD.scala:226) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:112) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Workaround in Spark for ConcurrentModification...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1000#issuecomment-47622999 @colorant if you can look into it and submit a fix, that'd be great! Thanks for reporting this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47624045 This is indeed threadsafe, but perhaps overeager. I think we should aim for get()s to be relatively fast, and I think we can avoid extra synchronization there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14391299 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -50,11 +50,12 @@ trait SQLConf { /** ** SQLConf functionality methods */ @transient - private val settings = java.util.Collections.synchronizedMap( -new java.util.HashMap[String, String]()) + private val settings = new java.util.concurrent.ConcurrentHashMap[String, String]() def set(props: Properties): Unit = { -props.asScala.foreach { case (k, v) = this.settings.put(k, v) } +settings.synchronized { --- End diff -- Perhaps we can remove this synchronized, I don't think we care about the consistency guarantees of inserting multiple properties at once :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14391324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -64,20 +65,21 @@ trait SQLConf { } def get(key: String): String = { -if (!settings.containsKey(key)) { - throw new NoSuchElementException(key) -} settings.get(key) } def get(key: String, defaultValue: String): String = { -if (!settings.containsKey(key)) defaultValue else settings.get(key) +settings.synchronized { + if (!settings.containsKey(key)) defaultValue else settings.get(key) --- End diff -- Let's use the ConcurrentHashMap-safe paradigm of ```scala Option(settings.get(key)).getOrElse(defaultValue) ``` Note that ConcurrentHashMap does not allow null values, so this is safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14391342 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -64,20 +65,21 @@ trait SQLConf { } def get(key: String): String = { -if (!settings.containsKey(key)) { - throw new NoSuchElementException(key) -} settings.get(key) } def get(key: String, defaultValue: String): String = { -if (!settings.containsKey(key)) defaultValue else settings.get(key) +settings.synchronized { + if (!settings.containsKey(key)) defaultValue else settings.get(key) +} } def getAll: Array[(String, String)] = settings.asScala.toArray def getOption(key: String): Option[String] = { -if (!settings.containsKey(key)) None else Some(settings.get(key)) +settings.synchronized { + if (!settings.containsKey(key)) None else Some(settings.get(key)) --- End diff -- Similarly, here, just ```scala Option(settings.get(key)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47624200 With the new `synchronized`s you added in usage, I don't think we need `ConcurrentHashMap` any more. Maybe just a simple `HashMap` is enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14391366 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -64,20 +65,21 @@ trait SQLConf { } def get(key: String): String = { -if (!settings.containsKey(key)) { - throw new NoSuchElementException(key) -} settings.get(key) } def get(key: String, defaultValue: String): String = { -if (!settings.containsKey(key)) defaultValue else settings.get(key) +settings.synchronized { + if (!settings.containsKey(key)) defaultValue else settings.get(key) --- End diff -- (Note that switching to this allows us to avoid adding the synchronized {} blocks.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1271#issuecomment-47625138 Thanks. I'm merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1271 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47625269 thanks @aarondav ï¼had modified according to your commentï¼please help me to check if it is proper --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47625603 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16279/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47625602 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47625708 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47625713 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47626045 Can you undo the indent spacing change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14391912 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -50,11 +50,10 @@ trait SQLConf { /** ** SQLConf functionality methods */ @transient - private val settings = java.util.Collections.synchronizedMap( -new java.util.HashMap[String, String]()) + private val settings = new java.util.concurrent.ConcurrentHashMap[String, String]() def set(props: Properties): Unit = { -props.asScala.foreach { case (k, v) = this.settings.put(k, v) } + props.asScala.foreach { case (k, v) = this.settings.put(k, v) } --- End diff -- can you reset the change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14392109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -64,20 +63,17 @@ trait SQLConf { } def get(key: String): String = { -if (!settings.containsKey(key)) { --- End diff -- Probably adding the checking logic is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47626937 And I also saw the code: ``` def toDebugString: String = { settings.synchronized { settings.asScala.toArray.sorted.map{ case (k, v) = s$k=$v }.mkString(\n) } } ``` Should we remove the synchronized block also? since we use the ConcurrentHashMap instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14392261 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -50,8 +50,7 @@ trait SQLConf { /** ** SQLConf functionality methods */ @transient - private val settings = java.util.Collections.synchronizedMap( -new java.util.HashMap[String, String]()) + private val settings = new java.util.concurrent.ConcurrentHashMap[String, String]() --- End diff -- We should not be using ConcurrentHashMap because this will be a very low contention code path. For low contention code path, ConcurrentHashMap is a very poor choice (as a matter of fact it'll likely be much slower than synchronized, and use a lot more memory) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47627343 Hi,@rxin ï¼had remove indent spacing on def set(props: Properties): Unit = { props.asScala.foreach { case (k, v) = this.settings.put(k, v) } } please help me to check if it is proper --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14392768 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -50,8 +50,7 @@ trait SQLConf { /** ** SQLConf functionality methods */ @transient - private val settings = java.util.Collections.synchronizedMap( -new java.util.HashMap[String, String]()) + private val settings = new java.util.concurrent.ConcurrentHashMap[String, String]() --- End diff -- undo to Collections.synchronizedMap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Workaround in Spark for ConcurrentModification...
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1000#issuecomment-47628775 @rxin correct me if I am wrong. The problem here is that the broadcastedConf is in per task HadoopRDD, synchronized on the method or on the broadcastedConf itself is good within this task. while when you call braodcastedConf.value.value, you actually return the value saved in the memory store,( when memory is enough and with deserialize approaching) this conf object should be the same one per node? say when getconf across task, you don't prevent to get the same conf object. and pass this conf object to JobConf(conf) lead to this problem. If I am right, then, broadcastedConf.value.value.synchronized might solve this problem? I am not 100% sure those reference across task staffs did work as I described above. What do you think about it? I will try to modify the code and see if it works, If this is true, I can do a quick pull request then --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
GitHub user colorant opened a pull request: https://github.com/apache/spark/pull/1273 [SPARK-1097] Workaround Hadoop conf ConcurrentModification issue Workaround Hadoop conf ConcurrentModification issue You can merge this pull request into a Git repository by running: $ git pull https://github.com/colorant/spark hadoopRDD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1273.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1273 commit 37c13b30a80793e05dd2300f9accbc29db17a336 Author: Raymond Liu raymond@intel.com Date: 2014-07-01T08:33:33Z Workaround Hadoop conf ConcurrentModification issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47631084 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47631092 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47631094 as described in #1000 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Workaround in Spark for ConcurrentModification...
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1000#issuecomment-47631409 @rxin, PR at #1273 , I tried for around 10 batches of job with that patch, do not see this problem happen again. without this patch, on my nodes, it do happen from time to time, say every 1-3 jobs will meet this problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2185] Emit warning when task size excee...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1149#issuecomment-47632471 Thanks. I've merged this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2185] Emit warning when task size excee...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1149 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47634108 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16280/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47634107 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47634643 hi @rxin, how to modify is proper? use settings.sycnronize { ... } to ensure the thread safe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47635045 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16281/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47635044 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47635811 Oops, passed #1268 related errors, but others failed... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2324] SparkContext should not exit dire...
GitHub user YanTangZhai opened a pull request: https://github.com/apache/spark/pull/1274 [SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error The spark.local.dir is configured as a list of multiple paths as follows /data1/sparkenv/local,/data2/sparkenv/local. If the disk data2 of the driver node has error, the application will exit since DiskBlockManager exits directly at createLocalDirs. If the disk data2 of the worker node has error, the executor will exit either. DiskBlockManager should not exit directly at createLocalDirs if one of spark.local.dir has error. Since spark.local.dir has multiple paths, a problem should not affect the overall situation. I think DiskBlockManager could ignore the bad directory at createLocalDirs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanTangZhai/spark SPARK-2324 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1274.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1274 commit df086731952c669e12673fd673d829b9fdd790a2 Author: yantangzhai tyz0...@163.com Date: 2014-07-01T10:39:46Z [SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Use the Executor's ClassLoader in sc.objectFil...
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/181#issuecomment-47647756 I was so slow, Bogdan has already fixed this in #821. Anyway, here's the belated test. It's probably still useful to avoid regressions. I tested the test by reverting Bogdan's change, and the test then fails with `ClassNotFoundException: FileSuiteObjectFileTest`. Either both the fix and the test are correct, or they are both bugged :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14407762 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -50,8 +50,7 @@ trait SQLConf { /** ** SQLConf functionality methods */ @transient - private val settings = java.util.Collections.synchronizedMap( -new java.util.HashMap[String, String]()) + private val settings = new java.util.concurrent.ConcurrentHashMap[String, String]() --- End diff -- @rxin I think the performance distinction is extremely minor in this case, as there is only one ConcurrentHashMap. ConcurrentHashMap's API tends to be nicer to use, though, as people may not realize that iteration over a SynchronizedMap is not threadsafe, like in the current implementation of SQLConf. As @baishuo mentioned, if we use synchronizedMap we'll have to add settings.synchronized {} in a few places now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] SPARK-2329 Add multi-label evaluation ...
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/1270#discussion_r14409556 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.SparkContext._ + +/** + * Evaluator for multilabel classification. + * NB: type Double both for prediction and label is retained + * for compatibility with model.predict that returns Double + * and MLUtils.loadLibSVMFile that loads class labels as Double + * + * @param predictionAndLabels an RDD of (predictions, labels) pairs, both are non-null sets. + */ +class MultilabelMetrics(predictionAndLabels:RDD[(Set[Double], Set[Double])]) extends Logging{ + + private lazy val numDocs = predictionAndLabels.count + + private lazy val numLabels = predictionAndLabels.flatMap{case(_, labels) = labels}.distinct.count + + /** + * Returns strict Accuracy + * (for equal sets of labels) + * @return strictAccuracy. + */ + lazy val strictAccuracy = predictionAndLabels.filter{case(predictions, labels) = +predictions == labels}.count.toDouble / numDocs + + /** + * Returns Accuracy + * @return Accuracy. + */ + lazy val accuracy = predictionAndLabels.map{ case(predictions, labels) = +labels.intersect(predictions).size.toDouble / labels.union(predictions).size}. --- End diff -- Do you suggest to extract labels.intersect(predictions).size as a lazy val? Will it then be calculated only once? The operation is made with Scala Set (not with RDD). Another option might be to store in RDD all intermediate calculations (including intersection) that are used in six different measures. In this case, I will need to make fold on the six-element tuple, which will look kind of scary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] SPARK-2329 Add multi-label evaluation ...
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/1270#discussion_r14409643 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.SparkContext._ + +/** + * Evaluator for multilabel classification. + * NB: type Double both for prediction and label is retained + * for compatibility with model.predict that returns Double + * and MLUtils.loadLibSVMFile that loads class labels as Double + * + * @param predictionAndLabels an RDD of (predictions, labels) pairs, both are non-null sets. + */ +class MultilabelMetrics(predictionAndLabels:RDD[(Set[Double], Set[Double])]) extends Logging{ + + private lazy val numDocs = predictionAndLabels.count + + private lazy val numLabels = predictionAndLabels.flatMap{case(_, labels) = labels}.distinct.count + + /** + * Returns strict Accuracy + * (for equal sets of labels) + * @return strictAccuracy. + */ + lazy val strictAccuracy = predictionAndLabels.filter{case(predictions, labels) = +predictions == labels}.count.toDouble / numDocs + + /** + * Returns Accuracy + * @return Accuracy. + */ + lazy val accuracy = predictionAndLabels.map{ case(predictions, labels) = +labels.intersect(predictions).size.toDouble / labels.union(predictions).size}. +fold(0.0)(_ + _) / numDocs + --- End diff -- The fold operation is made with RDD. I didn't find sum in the RDD interface, that's why I used fold. I will be happy to use sum instead. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47672298 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47672282 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/1275 update the comments in SqlParser SqlParser has been case-insensitive after https://github.com/apache/spark/commit/dab5439a083b5f771d5d5b462d0d517fa8e9aaf2 was merged You can merge this pull request into a Git repository by running: $ git pull https://github.com/CodingCat/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1275.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1275 commit 17931cd5cc79b406104b2e99f3131aa833e360ce Author: CodingCat zhunans...@gmail.com Date: 2014-07-01T16:27:39Z update the comments in SqlParser --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47677356 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2003] Fix python SparkContext example
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/1246#issuecomment-47679545 fyi, this pull request does not change the doc re setMaster --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-47681084 @vrilleup Just checked Matlabâs svd and svds. I donât remember I have used options.{tol, maxit} before. I wonder whether this is useful to expose to users. I did use RCOND before because I needed to compute very accurate solution. But that work was purely academic. In MLlibâs implementation, we take the A^T A approach, which couldnât give us very accurate small singular values if the matrix is ill-conditioned. So this is not useful either. My suggestion for the type signature is simply: ~~~ def computeSVD(k: Int, computeU: Boolean) ~~~ Letâs estimate the complexity of the dense approach and the iterative approach and decide which to use internally. We can open advanced options later, e.g. rcond, iter, method: {dense, arpack}, etc. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2334] fix rdd.id() Attribute Error
GitHub user dianacarroll opened a pull request: https://github.com/apache/spark/pull/1276 [SPARK-2334] fix rdd.id() Attribute Error rdd.id() was returning an Attribute Error in some cases because self._id is not getting set. So instead of returning the _id attribute, return the value of id() from the jrdd. Fixes bug SPARK-2334. Test with: sc.parallelize([1,2,3]).map(lambda x: x+1).id() You can merge this pull request into a Git repository by running: $ git pull https://github.com/dianacarroll/spark SPARK-2334 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1276.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1276 commit a69e6a9dde47b2eb8bb99e4a67013c107df80eb9 Author: Diana Carroll dcarr...@cloudera.com Date: 2014-07-01T17:08:45Z rdd.id(): return id of underlying jrdd In some cases self._id is not getting set and calls to id() are therefore resulting in an AttributeError. This change fixes that by returning the id of the underlying jrdd instead. Test case: sc.parallelize([1,2,3]).map(lambda x: x+1).id() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2334] fix rdd.id() Attribute Error
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1276#issuecomment-47683485 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47683286 We benchmarked treeReduce in our random forest implementation, and since the trees generated from each partition are fairly large (more than 100MB), we found that treeReduce can significantly reduce the shuffle time from 6mins to 2mins. Nice work! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47684501 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47684503 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16283/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47684505 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16282/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47684504 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47686100 @dbtsai Thanks for testing it! I'm going to move `treeReduce` and `treeAggregate` to `mllib.rdd.RDDFunctions`. For normal data processing, people generally use more partitions than number of cores. In those cases, the driver can collect task result while other tasks are running. This is not the optimal case for machine learning algorithms. So I think we can keep `treeReduce` and `treeAggregate` in mllib for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1982 regression: saveToParquetFile doesn...
Github user AndreSchumacher commented on the pull request: https://github.com/apache/spark/pull/934#issuecomment-47688117 True. Thanks for reminding. Closing this now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1982 regression: saveToParquetFile doesn...
Github user AndreSchumacher closed the pull request at: https://github.com/apache/spark/pull/934 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2324] SparkContext should not exit dire...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1274#discussion_r14418731 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -115,8 +121,9 @@ private[spark] class DiskBlockManager(shuffleManager: ShuffleBlockManager, rootD private def createLocalDirs(): Array[File] = { logDebug(sCreating local directories at root dirs '$rootDirs') +val localDirsResult = ArrayBuffer[File]() val dateFormat = new SimpleDateFormat(MMddHHmmss) -rootDirs.split(,).map { rootDir = +rootDirs.split(,).foreach { rootDir = --- End diff -- Scala style thing, you can use flatMap instead of foreach here and return None in the case where directory creation failed and Some(localDir) in the case where it worked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2334] fix rdd.id() Attribute Error
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1276#issuecomment-47688685 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16284/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2334] fix rdd.id() Attribute Error
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1276#issuecomment-47688683 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2324] SparkContext should not exit dire...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1274#discussion_r14418861 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -137,11 +144,12 @@ private[spark] class DiskBlockManager(shuffleManager: ShuffleBlockManager, rootD } if (!foundLocalDir) { logError(sFailed $MAX_DIR_CREATION_ATTEMPTS attempts to create local dir in $rootDir) --- End diff -- Maybe add to this log to say that you are ignoring this directory moving forward. e.g., something simple like, ```scala logError(sFailed $MAX_DIR_CREATION_ATTEMPTS attempts to create local dir in $rootDir. + Ignoring this directory.) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47688785 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47688798 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2324] SparkContext should not exit dire...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1274#discussion_r14418939 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -26,6 +26,8 @@ import org.apache.spark.executor.ExecutorExitCode import org.apache.spark.network.netty.{PathResolver, ShuffleSender} import org.apache.spark.util.Utils +import scala.collection.mutable.ArrayBuffer --- End diff -- nit: this scala.* import should go into their own block in between the java.* and org.* imports. See our [style guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2324] SparkContext should not exit dire...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1274#issuecomment-47689007 Jenkins, ok to to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2324] SparkContext should not exit dire...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1274#issuecomment-47690237 This change seems reasonable because on large clusters, we occasionally see a single disk on a single machine is failed, and this may cause the entire application to crash because the executor will keep getting restarted until the Master kills the application. It also allows a more uniform configuration for heterogeneous cluster with different numbers of disks. The downside of this behavioral change is that a misconfiguration like mistyping one of your local dirs may go unnoticed for a while, but this will hopefully become apparent after a `df` or a look at any of the executor logs. This fail-fast approach is generally better, but current Spark does not do a good job communicating the reason for executors that crash immediately upon startup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47693931 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47693932 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16285/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [HOTFIX] Synchronize on SQLContext.settings in...
GitHub user concretevitamin opened a pull request: https://github.com/apache/spark/pull/1277 [HOTFIX] Synchronize on SQLContext.settings in tests. Let's see if this fixes the ongoing series of test failures in a master build machine (https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT-pre-YARN/SPARK_HADOOP_VERSION=1.0.4,label=centos/81/). @pwendell @marmbrus You can merge this pull request into a Git repository by running: $ git pull https://github.com/concretevitamin/spark test-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1277.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1277 commit 28c88bd3aca1336025dbb808e3fdf6ad7ed688ab Author: Zongheng Yang zonghen...@gmail.com Date: 2014-07-01T18:42:26Z Synchronize on SQLContext.settings in tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [HOTFIX] Synchronize on SQLContext.settings in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1277#issuecomment-47694608 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [HOTFIX] Synchronize on SQLContext.settings in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1277#issuecomment-47694618 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2337] SQL String Interpolation
GitHub user ahirreddy opened a pull request: https://github.com/apache/spark/pull/1278 [SPARK-2337] SQL String Interpolation ```scala val sqlContext = new SQLContext(...) import sqlContext._ case class Person(name: String, age: Int) val people: RDD[Person] = ... val srdd = sqlSELECT * FROM $people ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ahirreddy/spark sql-interpolation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1278.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1278 commit 99d158ec3a47d8a48e38f522607a4c4539027ea2 Author: Ahir Reddy ahirre...@gmail.com Date: 2014-07-01T18:32:47Z Comment commit b8fea95ffe190bd353e3a75be7af56184680d8aa Author: Ahir Reddy ahirre...@gmail.com Date: 2014-07-01T18:44:51Z Added to comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47695864 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47696331 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2337] SQL String Interpolation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1278#issuecomment-47696328 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2337] SQL String Interpolation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1278#issuecomment-47696347 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47696348 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47703609 I just took a look at the running jenkinsstill a lot of errorsweird --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [HOTFIX] Synchronize on SQLContext.settings in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1277#issuecomment-47705529 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16286/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [HOTFIX] Synchronize on SQLContext.settings in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1277#issuecomment-47705528 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2337] SQL String Interpolation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1278#issuecomment-47707177 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47707179 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16288/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2337] SQL String Interpolation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1278#issuecomment-47707181 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16287/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: update the comments in SqlParser
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1275#issuecomment-47707178 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] SPARK-2329 Add multi-label evaluation ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1270#discussion_r14428136 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.SparkContext._ + +/** + * Evaluator for multilabel classification. + * NB: type Double both for prediction and label is retained + * for compatibility with model.predict that returns Double + * and MLUtils.loadLibSVMFile that loads class labels as Double + * + * @param predictionAndLabels an RDD of (predictions, labels) pairs, both are non-null sets. + */ +class MultilabelMetrics(predictionAndLabels:RDD[(Set[Double], Set[Double])]) extends Logging{ + + private lazy val numDocs = predictionAndLabels.count + + private lazy val numLabels = predictionAndLabels.flatMap{case(_, labels) = labels}.distinct.count + + /** + * Returns strict Accuracy + * (for equal sets of labels) + * @return strictAccuracy. + */ + lazy val strictAccuracy = predictionAndLabels.filter{case(predictions, labels) = +predictions == labels}.count.toDouble / numDocs + + /** + * Returns Accuracy + * @return Accuracy. + */ + lazy val accuracy = predictionAndLabels.map{ case(predictions, labels) = +labels.intersect(predictions).size.toDouble / labels.union(predictions).size}. +fold(0.0)(_ + _) / numDocs + --- End diff -- Ah, `sum` is defined in `DoubleRDDFunctions`. But looking at the `map` call, it seems like it would produce an `RDD[Double]`? I would think you can call `sum`, if you import `org.apache.spark.rdd.DoubleFunctions` maybe? Up to you what you like better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2159: Add support for stopping SparkCont...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1230#issuecomment-47708185 Yeah I should have said `exit` command and not functionality. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] SPARK-2329 Add multi-label evaluation ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1270#discussion_r14428640 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import org.apache.spark.Logging +import org.apache.spark.rdd.RDD +import org.apache.spark.SparkContext._ + +/** + * Evaluator for multilabel classification. + * NB: type Double both for prediction and label is retained + * for compatibility with model.predict that returns Double + * and MLUtils.loadLibSVMFile that loads class labels as Double + * + * @param predictionAndLabels an RDD of (predictions, labels) pairs, both are non-null sets. + */ +class MultilabelMetrics(predictionAndLabels:RDD[(Set[Double], Set[Double])]) extends Logging{ + + private lazy val numDocs = predictionAndLabels.count + + private lazy val numLabels = predictionAndLabels.flatMap{case(_, labels) = labels}.distinct.count + + /** + * Returns strict Accuracy + * (for equal sets of labels) + * @return strictAccuracy. + */ + lazy val strictAccuracy = predictionAndLabels.filter{case(predictions, labels) = +predictions == labels}.count.toDouble / numDocs + + /** + * Returns Accuracy + * @return Accuracy. + */ + lazy val accuracy = predictionAndLabels.map{ case(predictions, labels) = +labels.intersect(predictions).size.toDouble / labels.union(predictions).size}. +fold(0.0)(_ + _) / numDocs + --- End diff -- After ``import org.apache.spark.SparkContext._``, it should already be there as an implicit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2165] spark on yarn: add support for se...
GitHub user knusbaum opened a pull request: https://github.com/apache/spark/pull/1279 [SPARK-2165] spark on yarn: add support for setting maxAppAttempts in the ApplicationSubmissionContext You can merge this pull request into a Git repository by running: $ git pull https://github.com/knusbaum/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1279.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1279 commit 41e8a394cd74e42f2228eb880442cb0d6902f275 Author: Kyle Nusbaum knusb...@yahoo-inc.com Date: 2014-06-24T20:19:16Z Testing commit c2a2b69b623a792bc3e7e1e278a2be2668573632 Author: Kyle Nusbaum knusb...@yahoo-inc.com Date: 2014-07-01T20:46:35Z Preparing for pull commit b69955080537bebccc1f2e4bf05ee584a1e429f9 Author: Kyle Nusbaum knusb...@yahoo-inc.com Date: 2014-07-01T20:48:44Z Merge remote-tracking branch 'community/master' commit 2532b6755ff2876516679b0c90e97fd031a111df Author: Kyle Nusbaum knusb...@yahoo-inc.com Date: 2014-07-01T21:05:15Z Cleanup --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47717333 Yeah, what is motivating this change? When this class got introduced, @rxin commented that java.util.ConcurrentHashMap had bad memory footprint and suggested the current approach instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47717946 Sorry, I didn't realize Reynold had already commented on this thread. The current changes with Option look good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK
Github user vrilleup commented on the pull request: https://github.com/apache/spark/pull/964#issuecomment-47722138 @mengxr Yes, you are right. Keeping API simple might be more important than flexibility. I think tol can be set to a default value (e.g. 1e-10, which is the default in matlab). But maxit is related to k, 300 would be enough for most cases, or max(300, k*2) if k is large. For the naming, how about computeTruncatedSVD? (I saw this term used in many papers, also in my thesis). or computeSVDs (like svds in matlab)? I think separating svd and svds in two functions is better than deciding dense/sparse impl internally for the user. It's hard to enumerate all use cases and decide an impl logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2340] Resolve History Server file ...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/1280 [WIP][SPARK-2340] Resolve History Server file paths properly We resolve relative paths to the local `file:/` system for `--jars` and `--files` in spark submit. We should do the same for the history server. TODO: make sure event logs are also resolved properly. TODO: test this on standalone and YARN clusters. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark hist-serv-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1280.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2340] Resolve History Server file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1280#issuecomment-47723879 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2340] Resolve History Server file ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1280#issuecomment-47723890 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2340] Resolve History Server file paths...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1280#issuecomment-47725651 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---