[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-65179375 Merged into master and branch-1.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3480 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-65031620 @pwendell The example data do not need to be on the classpath. They are sample data files used by mllib examples, e.g., BinaryClassification, MovieLensALS. Usually the example code is the starting point for users. @srowen 's change makes it easy to run exmaples: 1. download and unzip the distribution zip 2. run `bin/run-example mllib.DatasetExample`, which will read a file under `data/` by default. The change looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-65020606 Oh I see - yeah I meant we'd also re-write the examples to correctly load example data from the classpath. If something is in `src/main/resources`, you can just look for the resource using java's resource API. However, maybe this makes the examples too confusing. I could see someone getting tripped up on loading a file from a classpath resource (a fairly advanced concept). Also if the examples do `sc.textFile(/path/to/example)` or something, that won't work. So maybe my suggestion is at odds with having simple, easy to understand, examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-64981160 @pwendell Sure, I agree that the size isn't a big deal, and there's not really a case where you would use the distribution without the examples .jar. Forget the size issue. Really it is that none of the code would read the data files from `src/main/resources`, nor would any of the examples as given in their documentation. They all refer to `data/`, which is where the files are in the source tree and in the source distribution. This makes the binary distribution consistent with those, and with all the documented examples. If there's interest, of course I can make another PR to move these files and update all the docs to look for `examples/src/main/resources/...` Heck you could even write the examples to look for the data files within the .jar instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-64976248 Hey Sean - I don't quite understand. The only use of the examples project is to produce the assembly jar for use in distributions, so it seems legitimate to include them as resources for that project. Putting them in the jar would not increase the total size of the distribution, it would just relocate them to being inside of the jar. The examples jar is not used outside of this context, so embedding more data in there doesn't matter, from what I can tell. We actually removed examples from the set of published jars for 1.2. It was sort of weird that we were publishing it since there is no public API in there and just standalone programs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-64918140 Generally yes I'd put resources in `src/main/resources`. These aren't in that location, and the examples are written to expect them under `data/` at the root level. I suppose putting them in `src/main/resources` causes them to be built into the .jar file, and they're not tiny. That is, they're not actually used as resources in this way. So I left them where they are but just put them into the tarball in the same place the source tree and examples expect them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-64917713 Hey sean - any reason not to put these in `src/main/resources` within the examples module? This is what spark sql does and it seems like a better model. That location is included in the dist already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-64657490 [Test build #23894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23894/consoleFull) for PR 3480 at commit [`47688f1`](https://github.com/apache/spark/commit/47688f132bd342bce1096480b83b1d27aa8bfd04). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-64657504 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23894/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3480#issuecomment-64608592 [Test build #23894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23894/consoleFull) for PR 3480 at commit [`47688f1`](https://github.com/apache/spark/commit/47688f132bd342bce1096480b83b1d27aa8bfd04). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/3480 SPARK-2192 [BUILD] Examples Data Not in Binary Distribution Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-2192 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3480.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3480 commit 47688f132bd342bce1096480b83b1d27aa8bfd04 Author: Sean Owen Date: 2014-11-26T13:33:14Z Add data/ to distributions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org