[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-12-01 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-65179375
  
Merged into master and branch-1.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-12-01 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-65031620
  
@pwendell The example data do not need to be on the classpath. They are 
sample data files used by mllib examples, e.g., BinaryClassification, 
MovieLensALS. Usually the example code is the starting point for users. @srowen 
's change makes it easy to run exmaples:

1. download and unzip the distribution zip
2. run `bin/run-example mllib.DatasetExample`, which will read a file under 
`data/` by default.

The change looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-30 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-65020606
  
Oh I see - yeah I meant we'd also re-write the examples to correctly load 
example data from the classpath. If something is in `src/main/resources`, you 
can just look for the resource using java's resource API. However, maybe this 
makes the examples too confusing. I could see someone getting tripped up on 
loading a file from a classpath resource (a fairly advanced concept). Also if 
the examples do `sc.textFile(/path/to/example)` or something, that won't work. 
So maybe my suggestion is at odds with having simple, easy to understand, 
examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-64981160
  
@pwendell Sure, I agree that the size isn't a big deal, and there's not 
really a case where you would use the distribution without the examples .jar. 
Forget the size issue.

Really it is that none of the code would read the data files from 
`src/main/resources`, nor would any of the examples as given in their 
documentation. They all refer to `data/`, which is where the files are in the 
source tree and in the source distribution. This makes the binary distribution 
consistent with those, and with all the documented examples.

If there's interest, of course I can make another PR to move these files 
and update all the docs to look for  `examples/src/main/resources/...`  Heck 
you could even write the examples to look for the data files within the .jar 
instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-29 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-64976248
  
Hey Sean - I don't quite understand. The only use of the examples project 
is to produce the assembly jar for use in distributions, so it seems legitimate 
to include them as resources for that project. Putting them in the jar would 
not increase the total size of the distribution, it would just relocate them to 
being inside of the jar.

The examples jar is not used outside of this context, so embedding more 
data in there doesn't matter, from what I can tell. We actually removed 
examples from the set of published jars for 1.2. It was sort of weird that we 
were publishing it since there is no public API in there and just standalone 
programs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-64918140
  
Generally yes I'd put resources in `src/main/resources`. These aren't in 
that location, and the examples are written to expect them under `data/` at the 
root level. I suppose putting them in `src/main/resources` causes them to be 
built into the .jar file, and they're not tiny. That is, they're not actually 
used as resources in this way. So I left them where they are but just put them 
into the tarball in the same place the source tree and examples expect them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-28 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-64917713
  
Hey sean - any reason not to put these in `src/main/resources` within the 
examples module? This is what spark sql does and it seems like a better model. 
That location is included in the dist already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-64657490
  
  [Test build #23894 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23894/consoleFull)
 for   PR 3480 at commit 
[`47688f1`](https://github.com/apache/spark/commit/47688f132bd342bce1096480b83b1d27aa8bfd04).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-64657504
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23894/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3480#issuecomment-64608592
  
  [Test build #23894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23894/consoleFull)
 for   PR 3480 at commit 
[`47688f1`](https://github.com/apache/spark/commit/47688f132bd342bce1096480b83b1d27aa8bfd04).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2192 [BUILD] Examples Data Not in Binary...

2014-11-26 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/3480

SPARK-2192 [BUILD] Examples Data Not in Binary Distribution

Simply, add data/ to distributions. This adds about 291KB (compressed) to 
the tarball, FYI.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-2192

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3480.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3480


commit 47688f132bd342bce1096480b83b1d27aa8bfd04
Author: Sean Owen 
Date:   2014-11-26T13:33:14Z

Add data/ to distributions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org