[GitHub] spark pull request: ignore cache paths for RAT tests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4569#issuecomment-74118271 [Test build #27361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27361/consoleFull) for PR 4569 at commit [`d0c9e7e`](https://github.com/apache/spark/commit/d0c9e7eef3b88318c847863b25a5713e1f4f0287). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: ignore cache paths for RAT tests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4569#issuecomment-74118289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27361/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user growse commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74132632 Merges cleanly into 1.2 here. Would personally be immensely useful to back-port into there. On 12 Feb 2015 18:53, Marcelo Vanzin notificati...@github.com wrote: Yes, this should go into 1.2. Should be a clean merge, though. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/4509#issuecomment-74128977. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3299][SQL] add to SQLContext API to sho...
Github user bbejeck commented on the pull request: https://github.com/apache/spark/pull/3872#issuecomment-74133969 Closing this pull request in deference to the work being done in PR #4547 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5778] throw if nonexistent metrics conf...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4571#issuecomment-74136376 [Test build #27368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27368/consoleFull) for PR 4571 at commit [`105288e`](https://github.com/apache/spark/commit/105288e2c661ed631e64cfdc3aeb79b2aea858eb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4452#issuecomment-74137303 @andrewor14 In light of https://github.com/apache/spark/commit/466b1f671b21f575d28f9c103f51765790914fe3 which is of similar severity and impact, I'm wondering if it does in fact make sense to back port to 1.2? I'm still getting the feel of what the rules of thumb are for backporting to minor release n-1, n-2, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5780] [PySpark] Mute the logging during...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4572#issuecomment-74137239 [Test build #27370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27370/consoleFull) for PR 4572 at commit [`1e9069c`](https://github.com/apache/spark/commit/1e9069cdd6bae3178a14bfb44450e571564323d7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5776 JIRA version not of form x.y.z brea...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/4570 SPARK-5776 JIRA version not of form x.y.z breaks merge_spark_pr.py Consider only x.y.z verisons from JIRA. CC @JoshRosen who will probably know this script well. Alternative is to call the version 2.0.0 after all in JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-5776 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4570.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4570 commit fffafdeb8b60ccfd17344fec5aa3d944eba1ea3f Author: Sean Owen so...@cloudera.com Date: 2015-02-12T19:04:22Z Consider only x.y.z verisons from JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5780] [PySpark] Mute the logging during...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/4572 [SPARK-5780] [PySpark] Mute the logging during unit tests There a bunch of logging coming from driver and worker, it's noisy and scaring, and a lots of exception in it, people are confusing about the tests are failing or not. This PR will mute the logging during tests, only show them if any one failed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark mute Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4572.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4572 commit 1e9069cdd6bae3178a14bfb44450e571564323d7 Author: Davies Liu dav...@databricks.com Date: 2015-02-12T19:20:03Z mute the logging during python tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5758][SQL] Use LongType as the default ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4544#issuecomment-74143193 [Test build #27365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27365/consoleFull) for PR 4544 at commit [`6e2ffc2`](https://github.com/apache/spark/commit/6e2ffc2edd3922e69bc3f02e72905e68c072cfb9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Improve error messages
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4558#issuecomment-74129699 [Test build #27364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27364/consoleFull) for PR 4558 at commit [`5e5ab50`](https://github.com/apache/spark/commit/5e5ab50243c2ef9bd36dc5db3391a6bab9d54fc3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-74129728 I agree with @tgravescs that we should avoid using private APIs where possible. On top of that, this patch would break the build for anything hadoop-2.6. If using these APIs, you'd have to use reflection to load them (or use shims like sql/hive does). When I thought about this problem, the biggest issue was to distribute the encryption key securely. If we can do that using the `UserGroupInformation` and `Token` classes, adding the actual encryption to the block manager shouldn't be hard. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5758][SQL] Use LongType as the default ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4544#issuecomment-74129697 [Test build #27365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27365/consoleFull) for PR 4544 at commit [`6e2ffc2`](https://github.com/apache/spark/commit/6e2ffc2edd3922e69bc3f02e72905e68c072cfb9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5760][SPARK-5761] Fix standalone rest p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4557#issuecomment-74129705 [Test build #27363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27363/consoleFull) for PR 4557 at commit [`b4dc980`](https://github.com/apache/spark/commit/b4dc9801b77deb7956cbe8065b4599d6819acf5b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3172 and SPARK-3577
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/2504#discussion_r24613511 --- Diff: core/src/main/scala/org/apache/spark/Aggregator.scala --- @@ -54,14 +55,13 @@ case class Aggregator[K, V, C] ( } combiners.iterator } else { - val combiners = new ExternalAppendOnlyMap[K, V, C](createCombiner, mergeValue, mergeCombiners) - combiners.insertAll(iter) - // Update task metrics if context is not null // TODO: Make context non optional in a future release - Option(context).foreach { c = -c.taskMetrics.memoryBytesSpilled += combiners.memoryBytesSpilled -c.taskMetrics.diskBytesSpilled += combiners.diskBytesSpilled - } + val spillMetrics = Option(context).map( +_.taskMetrics.getOrCreateShuffleReadSpillMetrics()).getOrElse(new WriteMetrics()) --- End diff -- If we're doing map-side combining, this should be for the shuffle write, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5783] Better eventlog-parsing error mes...
GitHub user ryan-williams opened a pull request: https://github.com/apache/spark/pull/4573 [SPARK-5783] Better eventlog-parsing error messages You can merge this pull request into a Git repository by running: $ git pull https://github.com/ryan-williams/spark history Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4573.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4573 commit b668b529d57e7963f2f08384f1fc5f24587cf989 Author: Ryan Williams ryan.blake.willi...@gmail.com Date: 2015-02-12T19:00:49Z add log info line to history-eventlog parsing commit 8deecf06face194d500c36e6dcfdd6a4e74cb604 Author: Ryan Williams ryan.blake.willi...@gmail.com Date: 2015-02-12T19:02:04Z add line number to history-parsing error message commit 98aa3fe1fcfc6ffafbd8481761dea1acd82e99c5 Author: Ryan Williams ryan.blake.willi...@gmail.com Date: 2015-02-12T19:02:30Z include filename in history-parsing error message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5776 JIRA version not of form x.y.z brea...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4570#issuecomment-74140973 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5783] Better eventlog-parsing error mes...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4573#issuecomment-74141979 [Test build #27372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27372/consoleFull) for PR 4573 at commit [`98aa3fe`](https://github.com/apache/spark/commit/98aa3fe1fcfc6ffafbd8481761dea1acd82e99c5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.1
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4566#issuecomment-74126927 [Test build #27362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27362/consoleFull) for PR 4566 at commit [`77e7840`](https://github.com/apache/spark/commit/77e7840d17ed8b9cc050d25256522b9901f48a1e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4509 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5765][Examples]Fixed word split problem...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4561#issuecomment-74138649 LGTM. @pwendell @srowen We should backport this to `branch-1.2` and `branch-1.3`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4452#issuecomment-74140364 In general there aren't really any strict rules for how far a patch should be back ported. One thing though is that it is much less likely for us to do more releases on older branches like 0.9, and people don't generally expect it either but instead may opt to upgrade to newer minor releases, which is desirable. So I'd say we usually just determine this on a case by case basis. For this particular patch it is definitely relevant for 1.2. For older branches, however, I believe a stable version of Hadoop 2.5 may not even have been released yet, so I wonder how much of a relevance there is. There will likely be nontrivial merge conflicts for branch-1.2 since we removed alpha support in 1.3 and refactored the YARN code quite a bit. Would you mind opening this PR against the 1.2 branch nevertheless? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3172 and SPARK-3577
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/2504#issuecomment-74140271 Ok that makes sense...now my concern is whether it's possible to do that without changing the developer API (see my in-line comment). A few other things: -Can you fix the import ordering? :) -It looks like, in external sorter, right now you just keep track of the spilled bytes. Can you add the spill time there too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user dyross commented on the pull request: https://github.com/apache/spark/pull/4540#issuecomment-74125274 I've updated the commit wording and fixed a few more instances, as requested. The script seems to execute fine in our build system, but I am not 100% sure about all the edge cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74126618 2 people have reviewed it, tests pass, looks like a clean fix. Since it could affect many users running YARN, I think it should back-port into 1.3. (The logic this changes wasn't present earlier, it appears.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/4540#discussion_r24605158 --- Diff: make-distribution.sh --- @@ -100,7 +100,7 @@ if [ -z $JAVA_HOME ]; then if [ $(command -v rpm) ]; then RPM_JAVA_HOME=$(rpm -E %java_home 2/dev/null) --- End diff -- I think we need to quote the subshell, since its output may contain spaces. ``` RPM_JAVA_HOME=$(rpm -E %java_home 2/dev/null) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user dyross commented on the pull request: https://github.com/apache/spark/pull/4540#issuecomment-74129343 Thanks for the review, @nchammas. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4540#issuecomment-74129298 This patch LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user ksakellis commented on a diff in the pull request: https://github.com/apache/spark/pull/4559#discussion_r24608462 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -740,6 +741,8 @@ private[spark] class ExternalSorter[K, V, C]( in.close() } } + context.taskMetrics.shuffleWriteMetrics.map( --- End diff -- Should we write this metric if there was an exception thrown above? if so, maybe we should add this in the finally block. If not, then this is ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Improve error messages
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4558#issuecomment-74129362 Thanks for the PR! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4563#discussion_r24608728 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -226,13 +247,14 @@ case class ParquetRelation2 f - new Footer(f.getPath, parquetMetadata) }.seq.toMap - partitionSpec = { -val partitionDirs = dataStatuses + partitionSpec = maybePartitionSpec.getOrElse { +val partitionDirs = leaves --- End diff -- Using all files rather than Parquet part-files only because there can be empty partitions. (Should we include entirely empty directories here? Namely directories don't even contain `_metadata` or `_common_metadata`.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user ksakellis commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74130013 LGTM, @sryza I think its okay to include the reading time since this whole operation you can argue is for writing out the file - even if there is some reading. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5776 JIRA version not of form x.y.z brea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4570#issuecomment-74131521 [Test build #27367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27367/consoleFull) for PR 4570 at commit [`fffafde`](https://github.com/apache/spark/commit/fffafdeb8b60ccfd17344fec5aa3d944eba1ea3f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4563#discussion_r24612662 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -545,13 +578,16 @@ object ParquetRelation2 { // Whether we should merge schemas collected from all Parquet part-files. val MERGE_SCHEMA = mergeSchema - // Hive Metastore schema, passed in when the Parquet relation is converted from Metastore + // Hive Metastore schema, passed in when the Parquet relation is converted from Metastore. val METASTORE_SCHEMA = metastoreSchema - // Default partition name to use when the partition column value is null or empty string + // Schema of partition keys for partitioned Hive Metastore Parquet tables. + val METASTORE_PARTITION_KEYS_SCHEMA = metastorePartitionKeysSchema --- End diff -- Ah, forgot to remove this. It's not used any more since we construct `PartitionSpec` manually and pass it to `ParquetRelation2` when converting Metastore Parquet tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4567#issuecomment-74141527 Looks good for now. However, in the future if we want to extend dynamic allocation to standalone mode we will have executors coming and going all the time, in which case `coresMax` will be arbitrarily large and it might make less sense then. @marsishandsome also what happens if executors fail multiple times (but not enough to fail the application), do we count all those cores too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.1
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4566#issuecomment-74141528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27362/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.1
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4566#issuecomment-74141519 [Test build #27362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27362/consoleFull) for PR 4566 at commit [`77e7840`](https://github.com/apache/spark/commit/77e7840d17ed8b9cc050d25256522b9901f48a1e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5776 JIRA version not of form x.y.z brea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4570#issuecomment-74145055 [Test build #27367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27367/consoleFull) for PR 4570 at commit [`fffafde`](https://github.com/apache/spark/commit/fffafdeb8b60ccfd17344fec5aa3d944eba1ea3f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5776 JIRA version not of form x.y.z brea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4570#issuecomment-74145067 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27367/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3299][SQL] add to SQLContext API to sho...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3872#issuecomment-74124497 @bbejeck Yes, please close yours. We can continue the work in #4547. Also, feel free to leave comments in that PR. Thank you for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5335] Fix deletion of security groups w...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4122#issuecomment-74127969 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5757][MLLIB] replace SQL JSON usage in ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4555#issuecomment-74128111 Merged into master and branch-1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5757][MLLIB] replace SQL JSON usage in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4555 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74130616 [Test build #27366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27366/consoleFull) for PR 4559 at commit [`ace156c`](https://github.com/apache/spark/commit/ace156c3bac206e772879c012ad58571f0a0bde7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/4525#discussion_r24613574 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -119,14 +138,79 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis if (!conf.contains(spark.testing)) { logCheckingThread.setDaemon(true) logCheckingThread.start() + logLazyReplayThread.setDaemon(true) + logLazyReplayThread.start() +} else { + logLazyReplay() } } - override def getListing() = applications.values + /** + * Fetch and Parse the log files + */ + private[history] def logLazyReplay() { +if(lazyApplications.isEmpty) return + +logDebug(start doLazyReplay) --- End diff -- `s/doLazyReplay/logLazyReplay/`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74139464 That's ok because the temp dir would be created underneath one of the root local dirs created by the `getOrCreateLocalRootDirs` method here: https://github.com/apache/spark/pull/4509/files#diff-d239aee594001f8391676e1047a0381eR698 So the top-most directory would have the right permissions (700 in the usual case, or whatever permissions Yarn sets when talking about Yarn containers). That's enough to prevent access to the underlying tree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/4525#discussion_r24613618 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -119,14 +138,79 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis if (!conf.contains(spark.testing)) { logCheckingThread.setDaemon(true) logCheckingThread.start() + logLazyReplayThread.setDaemon(true) + logLazyReplayThread.start() +} else { + logLazyReplay() } } - override def getListing() = applications.values + /** + * Fetch and Parse the log files + */ + private[history] def logLazyReplay() { +if(lazyApplications.isEmpty) return + +logDebug(start doLazyReplay) +val mergeSize = 20 +val bufferedApps = new ArrayBuffer[FsApplicationHistoryInfo](mergeSize) + +def addIfAbsent(newApps: mutable.LinkedHashMap[String, FsApplicationHistoryInfo], +info: FsApplicationHistoryInfo) { + if (!newApps.contains(info.id) || + newApps(info.id).logPath.endsWith(EventLoggingListener.IN_PROGRESS) + !info.logPath.endsWith(EventLoggingListener.IN_PROGRESS)) { +newApps += (info.id - info) + } +} + +def mergeApps(): mutable.LinkedHashMap[String, FsApplicationHistoryInfo] = { + val newApps = new mutable.LinkedHashMap[String, FsApplicationHistoryInfo]() + bufferedApps.sortWith(compareAppInfo) + + val newIterator = bufferedApps.iterator.buffered + val oldIterator = applications.values.iterator.buffered + while (newIterator.hasNext oldIterator.hasNext) { +if (compareAppInfo(newIterator.head, oldIterator.head)) { + addIfAbsent(newApps, newIterator.next()) +} else { + addIfAbsent(newApps, oldIterator.next()) +} + } + newIterator.foreach(addIfAbsent(newApps, _)) + oldIterator.foreach(addIfAbsent(newApps, _)) + + newApps +} + +val bus = new ReplayListenerBus() +while(lazyApplications.nonEmpty){ + lazyApplications.iterator.take(mergeSize).foreach(keyValue = { +try{ + val lazyInfo = keyValue._2 + val info = replay(lazyInfo.eventLog, bus) + bufferedApps += info + logDebug(replay application + lazyInfo.id + successfully) +} catch { + case e: Exception = +} + }) + applications = mergeApps() + for(i - 1 to bufferedApps.size) lazyApplications.remove(lazyApplications.head._1) + bufferedApps.clear() +} +logDebug(finish doLazyReplay) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Improve error messages
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4558#issuecomment-74142902 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27364/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Improve error messages
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4558#issuecomment-74142895 [Test build #27364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27364/consoleFull) for PR 4558 at commit [`5e5ab50`](https://github.com/apache/spark/commit/5e5ab50243c2ef9bd36dc5db3391a6bab9d54fc3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5776 JIRA version not of form x.y.z brea...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4570 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5760][SPARK-5761] Fix standalone rest p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4557#issuecomment-74144197 [Test build #27363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27363/consoleFull) for PR 4557 at commit [`b4dc980`](https://github.com/apache/spark/commit/b4dc9801b77deb7956cbe8065b4599d6819acf5b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74144214 [Test build #27366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27366/consoleFull) for PR 4559 at commit [`ace156c`](https://github.com/apache/spark/commit/ace156c3bac206e772879c012ad58571f0a0bde7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74144225 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27366/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5760][SPARK-5761] Fix standalone rest p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4557#issuecomment-74144208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27363/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4563#discussion_r24616222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -106,12 +106,12 @@ class DefaultSource ParquetRelation.createEmpty( path, data.schema.toAttributes, -false, +true, --- End diff -- Use named parameter for allowExisting. Also, seems we will overwrite the metadata file. In the case of Append, do we still want to overwrite the metadata? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5757][MLLIB] replace SQL JSON usage in ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4555#issuecomment-74122717 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4540#issuecomment-74127622 Thanks for the updates @dyross! One minor comment and I think this is good to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/4540#discussion_r24605724 --- Diff: make-distribution.sh --- @@ -147,7 +147,7 @@ if [[ ! $JAVA_VERSION =~ 1.6 -z $SKIP_JAVA_TEST ]]; then echo Output from 'java -version' was: echo $JAVA_VERSION read -p Would you like to continue anyways? [y,n]: -r - if [[ ! $REPLY =~ ^[Yy]$ ]]; then + if [[ ! $REPLY =~ ^[Yy]$ ]]; then --- End diff -- Since we are using the Bash-specific `[[ ... ]]` syntax here, there is actually no word splitting, so the original line should be safe. But I don't think it hurts to quote `$REPLY` anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4563#discussion_r24608107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -106,12 +106,12 @@ class DefaultSource ParquetRelation.createEmpty( --- End diff -- Currently we are still using some utility functions like this one from the old Parquet support code. We can move them into the new data source in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74128655 I think this was just an oversight, it's good to include that time. The change looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5778] throw if nonexistent metrics conf...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4571#issuecomment-74140479 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27368/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5778] throw if nonexistent metrics conf...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4571#issuecomment-74140463 [Test build #27368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27368/consoleFull) for PR 4571 at commit [`105288e`](https://github.com/apache/spark/commit/105288e2c661ed631e64cfdc3aeb79b2aea858eb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74142807 I see, that's because all of those places call `createTempDir(Utils.getLocalDirs(...))`, where `getLocalDirs` calls the `getOrCreateLocalDirs`. What about this one though: https://github.com/apache/spark/blob/99bd5006650bb15ec5465ffee1ebaca81354a3df/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala#L102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5758][SQL] Use LongType as the default ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4544#issuecomment-74143200 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27365/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4563#discussion_r24616376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -106,12 +106,12 @@ class DefaultSource ParquetRelation.createEmpty( path, data.schema.toAttributes, -false, +true, sqlContext.sparkContext.hadoopConfiguration, sqlContext) val createdRelation = createRelation(sqlContext, parameters, data.schema) - createdRelation.asInstanceOf[ParquetRelation2].insert(data, true) + createdRelation.asInstanceOf[ParquetRelation2].insert(data, mode == SaveMode.Overwrite) --- End diff -- Use named parameter for overwrite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74121548 Yep, looks good. Thanks for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: ignore cache paths for RAT tests
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4569 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5641] Allow spark_ec2.py to copy a wide...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/4487#issuecomment-74126044 @florianverhein Sorry for the delay in getting back. At a high level adding a flag to specify a `deploy.generic` directory seems better than asking people to copy binaries. However there are a couple of things that I am still concerned about - To do variable substitution we make a copy of files from deploy.generic to a temp directory and then copy files after substitution from the temp directory to the EC2 machine. This might be a bad idea for large rpms / binaries as we will unnecessarily create copies. - I am not sure why one would want to add an extra module through deploy.generic. Isn't it easier / better to have a fork of spark-ec2 where the module is placed and then just use `--spark-ec2-git-repo` ? - Finally I am not sure I completely get the case for adding new flags (like `TEST_MODULES`). If these flags need to be set based on EC2 instances, then it looks like one does need to edit `spark_ec2.py` -- if not, new flags can be directly added to `ec2-variables.sh` (i.e. by editing the file in the source tree) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.1
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4566#issuecomment-74126055 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74127553 I take it back. I think this may need to back port to 1.2 as well. Looking into that now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5758][SQL] Use LongType as the default ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4544#issuecomment-74128706 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user dyross commented on a diff in the pull request: https://github.com/apache/spark/pull/4540#discussion_r24608174 --- Diff: make-distribution.sh --- @@ -100,7 +100,7 @@ if [ -z $JAVA_HOME ]; then if [ $(command -v rpm) ]; then RPM_JAVA_HOME=$(rpm -E %java_home 2/dev/null) --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5747: Fix wordsplitting bugs in make-dis...
Github user dyross commented on a diff in the pull request: https://github.com/apache/spark/pull/4540#discussion_r24608226 --- Diff: make-distribution.sh --- @@ -147,7 +147,7 @@ if [[ ! $JAVA_VERSION =~ 1.6 -z $SKIP_JAVA_TEST ]]; then echo Output from 'java -version' was: echo $JAVA_VERSION read -p Would you like to continue anyways? [y,n]: -r - if [[ ! $REPLY =~ ^[Yy]$ ]]; then + if [[ ! $REPLY =~ ^[Yy]$ ]]; then --- End diff -- Ok, I'll leave it quoted since it's consistent with other places. But thanks for the note. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74128977 Yes, this should go into 1.2. Should be a clean merge, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/4559#discussion_r24608918 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -740,6 +741,8 @@ private[spark] class ExternalSorter[K, V, C]( in.close() } } + context.taskMetrics.shuffleWriteMetrics.map( --- End diff -- Good point -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74136388 [Test build #27369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27369/consoleFull) for PR 4559 at commit [`94e4237`](https://github.com/apache/spark/commit/94e4237fd122508382a95bf4d4309a449b6ac408). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5752][SQL] Don't implicitly convert RDD...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4556#issuecomment-74138221 [Test build #27371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27371/consoleFull) for PR 4556 at commit [`f755d21`](https://github.com/apache/spark/commit/f755d2129a663f027cade984acfd419281bb4222). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74130494 I had the same thought as @ksakellis re:reading time. Thanks for looking at this all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3299][SQL] add to SQLContext API to sho...
Github user bbejeck closed the pull request at: https://github.com/apache/spark/pull/3872 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5778] throw if nonexistent metrics conf...
GitHub user ryan-williams opened a pull request: https://github.com/apache/spark/pull/4571 [SPARK-5778] throw if nonexistent metrics config file provided previous behavior was to log an error; this is fine in the general case where no `spark.metrics.conf` parameter was specified, in which case a default `metrics.properties` is looked for, and the execption logged and suppressed if it doesn't exist. if the user has purposefully specified a metrics.conf file, however, it makes more sense to show them an error when said file doesn't exist. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ryan-williams/spark metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4571.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4571 commit 105288e2c661ed631e64cfdc3aeb79b2aea858eb Author: Ryan Williams ryan.blake.willi...@gmail.com Date: 2015-02-12T19:06:25Z throw if nonexistent metrics config file provided previous behavior was to log an error; this is fine in the general case where no `spark.metrics.conf` parameter was specified, in which case a default `metrics.properties` is looked for, and the execption logged and suppressed if it doesn't exist. if the user has purposefully specified a metrics.conf file, however, it makes more sense to show them an error when said file doesn't exist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74138908 @vanzin just a quick question as of this patch `createTempDir` no longer does a chmod700. Is that OK? `createTempDir` is used mostly by HTTP file server, so don't we need to restrict the permissions on the files served there too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74047349 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27342/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4563#discussion_r24572199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -151,9 +151,12 @@ private[parquet] case class PartitionSpec(partitionColumns: StructType, partitio * discovery. */ @DeveloperApi -case class ParquetRelation2 -(paths: Seq[String], parameters: Map[String, String], maybeSchema: Option[StructType] = None) -(@transient val sqlContext: SQLContext) +case class ParquetRelation2( +paths: Seq[String], +parameters: Map[String, String], +maybeSchema: Option[StructType] = None, +maybePartitionSpec: Option[PartitionSpec] = None)( --- End diff -- Right now only used when converting Hive Metastore Parquet tables. If this is defined, we don't do partition discovery below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5770] Fix bug: Use addJar() to upload a...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4565#issuecomment-74058430 This change doesn't make sense; it does not seem to relate to the issue you are reporting. It sounds like you are exercising the case where overwrite is enabled, but you've taken out the logic to handle it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5655] Don't chmod700 application files ...
Github user growse commented on the pull request: https://github.com/apache/spark/pull/4509#issuecomment-74059820 Moving the chmod700 functionality has simplified this somewhat, have tested as working on YARN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5770] Fix bug: Use addJar() to upload a...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4565#issuecomment-74059776 The current code path appears to correctly overwrite the old file with a new one. If you're saying you think it doesn't, have you debugged way? I don't think you can just take this out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5760][SPARK-5761] Fix standalone rest p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4557#discussion_r24563933 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestClient.scala --- @@ -155,10 +156,21 @@ private[spark] class StandaloneRestClient extends Logging { /** * Read the response from the server and return it as a validated [[SubmitRestProtocolResponse]]. * If the response represents an error, report the embedded message to the user. + * Exposed for testing. */ - private def readResponse(connection: HttpURLConnection): SubmitRestProtocolResponse = { + private[spark] def readResponse(connection: HttpURLConnection): SubmitRestProtocolResponse = { try { - val responseJson = Source.fromInputStream(connection.getInputStream).mkString + val dataStream = +if (connection.getResponseCode == HttpServletResponse.SC_OK) { + connection.getInputStream +} else { + connection.getErrorStream +} + // If the server threw an exception while writing a response, it will not have a body + if (dataStream == null) { +throw new SubmitRestProtocolException(Server returned empty body) + } --- End diff -- This fixes (2) on [SPARK-5760](https://issues.apache.org/jira/browse/SPARK-5760) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5760][SPARK-5761] Fix standalone rest p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4557#discussion_r24563945 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -146,11 +132,7 @@ private abstract class StandaloneRestServlet extends HttpServlet with Logging { val message = validateResponse(responseMessage, responseServlet) responseServlet.setContentType(application/json) responseServlet.setCharacterEncoding(utf-8) -responseServlet.setStatus(HttpServletResponse.SC_OK) -val content = message.toJson.getBytes(Charsets.UTF_8) -val out = new DataOutputStream(responseServlet.getOutputStream) -out.write(content) -out.close() +responseServlet.getWriter.write(message.toJson) --- End diff -- This fixes (1) on [SPARK-5760](https://issues.apache.org/jira/browse/SPARK-5760) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Improve error messages
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/4558 [SQL] Improve error messages You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark errorMessages Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4558 commit 6197cd53aebfe4105478b834968397f20e1c0fe0 Author: Michael Armbrust mich...@databricks.com Date: 2015-02-12T07:29:16Z [SQL] Better error messages for analysis failures commit 34eb3a46efe497c34f417b1701e6ef722d57f077 Author: Michael Armbrust mich...@databricks.com Date: 2015-02-12T08:01:11Z more work --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Improve error messages
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4558#issuecomment-74037043 [Test build #27336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27336/consoleFull) for PR 4558 at commit [`d4e9015`](https://github.com/apache/spark/commit/d4e9015a11922d5e25f824c0839817840073ec35). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Improve error messages
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/4558#discussion_r24568336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -66,32 +66,81 @@ class Analyzer(catalog: Catalog, typeCoercionRules ++ extendedRules : _*), Batch(Check Analysis, Once, - CheckResolution :: - CheckAggregation :: - Nil: _*), -Batch(AnalysisOperators, fixedPoint, - EliminateAnalysisOperators) + CheckResolution), +Batch(Remove SubQueries, fixedPoint, + EliminateSubQueries) ) /** * Makes sure all attributes and logical plans have been resolved. */ object CheckResolution extends Rule[LogicalPlan] { +def failAnalysis(msg: String) = { throw new AnalysisException(msg) } + def apply(plan: LogicalPlan): LogicalPlan = { - plan.transformUp { -case p if p.expressions.exists(!_.resolved) = - val missing = p.expressions.filterNot(_.resolved).map(_.prettyString).mkString(,) - val from = p.inputSet.map(_.name).mkString({, , , }) - - throw new AnalysisException(sCannot resolve '$missing' given input columns $from) -case p if !p.resolved p.childrenResolved = - throw new AnalysisException(sUnresolved operator in the query plan ${p.simpleString}) - } match { -// As a backstop, use the root node to check that the entire plan tree is resolved. -case p if !p.resolved = - throw new AnalysisException(sUnresolved operator in the query plan ${p.simpleString}) -case p = p + // We transform up and order the rules so as to catch the first possible failure instead + // of the result of cascading resolution failures. + plan.foreachUp { +case operator: LogicalPlan = + operator transformAllExpressions { --- End diff -- we can use transformExpressionsUp here to cover the case of ``` SELECT CAST(x AS STRING) FROM src``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5752][SQL] Don't implicitly convert RDD...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4556#issuecomment-74042197 [Test build #27332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27332/consoleFull) for PR 4556 at commit [`ab58d66`](https://github.com/apache/spark/commit/ab58d66c2238b03328a14711f2e86d7950290088). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait DataFrame extends RDDApi[Row] with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74035371 The shuffle write time is meant to be measured in nanoseconds, right? Also, we should be measuring write time as well for the more common case where we're merging partition files. Unless that's somehow measured already? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][DOCS] Update sql documentation
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4560#discussion_r24566626 --- Diff: docs/sql-programming-guide.md --- @@ -84,12 +84,12 @@ feature parity with a HiveContext. div data-lang=java markdown=1 The entry point into all relational functionality in Spark is the -[JavaSQLContext](api/scala/index.html#org.apache.spark.sql.api.java.JavaSQLContext) class, or one +[SQLContext](api/scala/index.html#org.apache.spark.sql.api.SQLContext) class, or one of its descendants. To create a basic JavaSQLContext, all you need is a JavaSparkContext. --- End diff -- you missed this one : JavaSQLContext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix make-distribution.sh by adding quotes to $...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4540#issuecomment-74039763 Rather than add yet more JIRAs, let's just make this part of the SPARK-5747 umbrella. We may have two commits, sure. Putting the JIRA in the title, and perhaps addressing a few more instances of this in the script, would be great. @pwendell quick check -- if we make a fix like this, how far back does it get back-ported? 1.2 or further? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5765][Examples]Fixed word split problem...
GitHub user gvramana opened a pull request: https://github.com/apache/spark/pull/4561 [SPARK-5765][Examples]Fixed word split problem in run-example and compute-classpath Author: Venkata Ramana G ramana.gollam...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/gvramana/spark word_split Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4561.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4561 commit 285c8d4cdc96056b87caa4b630efe939a6429d46 Author: Venkata Ramana Gollamudi ramana.gollam...@huawei.com Date: 2015-02-12T09:14:54Z Fixed word split problem in run-example and compute-classpath --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [Minor] Passdown the schema for Parquet ...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/4562 [SQL] [Minor] Passdown the schema for Parquet File in HiveContext It's not allowed to be the empty directory for parquet, for example, it will failed when query the following ``` CREATE TABLE parquet_test (id int, str string) STORED AS PARQUET; SELECT * FROM parquet_test; ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark parquet_error Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4562.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4562 commit 33867c09248b22cca8456268af176f17206a4b74 Author: Cheng Hao hao.ch...@intel.com Date: 2015-02-12T09:19:27Z passdown the schema for Parquet File in HiveContext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5762] Fix shuffle write time for sort-b...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4559#issuecomment-74043263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27337/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74047341 [Test build #27342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27342/consoleFull) for PR 4525 at commit [`2e59eb7`](https://github.com/apache/spark/commit/2e59eb73086ad2c9243f6d1f340707841190d2d3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-74048388 [Test build #27343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27343/consoleFull) for PR 4525 at commit [`c1637e3`](https://github.com/apache/spark/commit/c1637e3335620ed8aac39dfeb3f9fe1252abfadd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org