Can you send me the output of those two commands

On Fri, May 1, 2020 at 1:46 PM Xiangyu Li <yisky...@gmail.com> wrote:

> Hi Holden,
>
> Please check the second email of mine in this email chain. I did that
> originally and to quote my email:
>
>
> ===========================================================================================
> In the spark-2.4.5 src directory, I just did a simple:
>
> `./build/mvn -DskipTests clean package`
>
>
> And then went to the python directory and did:
>
>
> `python setup.py sdist` followed by `pip install
> dist/pyspark-2.4.5.tar.gz` (as mentioned in the make-distribution.sh.)
>
>
> *This ran into "error: package directory `deps/jars` does not exist".*
>
>
> ===============================================================================================
>
>
> So exactly as what you said, which is also one of the printout message in
> the make-distribution.sh script.
>
> On Fri, May 1, 2020 at 4:39 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> Your problem isn't the missing license per-se (that just happens to be
>> the first error).
>>
>> I don't believe that is the way we expect users to pip install the Python
>> library. pip will only install directories/targets underneath the directory
>> where setup.py, hence the deps directory which is constructed by setup.py
>> with a bunch of symlinks. It assumes that you are either building Spark
>> from source in which case you should follow it's instructions:
>>
>>     To build Spark with maven you can run:
>>       ./build/mvn -DskipTests clean package
>>     Building the source dist is done in the Python directory:
>>       cd python
>>       python setup.py sdist
>>       pip install dist/*.tar.gz
>>
>>
>> On Fri, May 1, 2020 at 1:32 PM Xiangyu Li <yisky...@gmail.com> wrote:
>>
>>> make-distribution.sh with --pip would run a `python setup.py sdist`
>>> within that make-distribution.sh script.
>>> I also tested `make-distribution.sh` without --pip, and the same error
>>> happens.
>>>
>>> Correct me if I'm wrong, but pyspark binary has always been successfully
>>> built, it is the pyspark pip package that is failing.
>>>
>>> On Fri, May 1, 2020 at 4:23 PM Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> Hm, others may have to chime in here. Either that's not how you create
>>>> the pyspark binary from the source release (make-distribution.sh doesn't do
>>>> that?) or there is a small but important issue here, that the source
>>>> release doesn't contain one thing that the binary release script expects,
>>>> which is LICENSE-binary et al. If it's the latter, we could move around the
>>>> LICENSE bits in the source tree so that both are "source" files included in
>>>> the source release, so you can make the binary release with it, but, I'd
>>>> probably say it's easier/better to simply skip adding the license in this
>>>> path (if it's supposed to work this way at all) as the use case, a custom
>>>> derived work, doesn't need the *ASF's* license statement.
>>>>
>>>>
>>>> On Fri, May 1, 2020 at 3:13 PM Xiangyu Li <yisky...@gmail.com> wrote:
>>>>
>>>>> To reproduce this, I just did
>>>>>
>>>>> curl -O
>>>>> http://www.trieuvan.com/apache/spark/spark-2.4.5/spark-2.4.5.tgz
>>>>> tar xzf spark-2.4.5.tgz
>>>>> cd spark-2.4.5
>>>>> ./dev/make-distribution.sh --name custom-spark --pip --tgz -Phadoop-2.7
>>>>> mv spark-2.4.5-bin-custom-spark.tgz ../
>>>>> cd ..
>>>>> tar xzf spark-2.4.5-bin-custom-spark.tgz
>>>>> cd spark-2.4.5-bin-custom-spark/python/
>>>>> sudo python setup.py install
>>>>>
>>>>> And here is the output:
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>> On Fri, May 1, 2020 at 2:48 PM Sean Owen <sro...@gmail.com> wrote:
>>>>>
>>>>>> You wrote:
>>>>>>
>>>>>> "
>>>>>> 2. On each machine, I can install pyspark by running `python setup.py
>>>>>> install` inside the python directory.
>>>>>>
>>>>>> Step 2 would fail because of missing the licenses directory.
>>>>>> "
>>>>>>
>>>>>> That shouldn't depend on the license file, and the script you showed
>>>>>> does not fail when not present, so I am wondering what this means.
>>>>>> I'm not sure there's a JIRA here yet.
>>>>>>
>>>>>> On Fri, May 1, 2020 at 1:46 PM Xiangyu Li <yisky...@gmail.com> wrote:
>>>>>>
>>>>>>> Hmm, sorry I don't get what part of my email were you referring to
>>>>>>> when you said "the build fails?".
>>>>>>>
>>>>>>> So I am trying to build a custom spark binary distribution with,
>>>>>>> say, different Hadoop versions and R support.
>>>>>>>
>>>>>>> Then I stored this custom build on S3, so as I am building more
>>>>>>> machines I can just directly download this custom build from S3. But
>>>>>>> besides spark-submit and what not, I also wanted to install the pyspark
>>>>>>> python package to the machine I am building.
>>>>>>>
>>>>>>> The lack of the LICENSE file in the custom build would prevent
>>>>>>> pyspark from being successfully built.
>>>>>>>
>>>>>>> Hopefully this answers your question.
>>>>>>>
>>>>>>> The second part of my last email was about building pyspark inside
>>>>>>> spark source directory, I will raise an issue on Jira for that, as it is
>>>>>>> more of a clean cut problem with the documentation on the website and 
>>>>>>> the
>>>>>>> comments in make-distribution.sh.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 1, 2020 at 1:31 PM Sean Owen <sro...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hm, the build fails? you can see this is just skipped if not
>>>>>>>> present, for this reason.
>>>>>>>> I'm not clear why you need the file for its own sake, for your own
>>>>>>>> internal modification that you don't redistribute.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 1, 2020 at 11:43 AM Xiangyu Li <yisky...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sean,
>>>>>>>>>
>>>>>>>>> Thanks for the quick response! Yes, what you described about how
>>>>>>>>> LICENSE file should be distributed makes sense.
>>>>>>>>>
>>>>>>>>> The reason I learned about this is that I was trying to build
>>>>>>>>> spark-2.4.5-bin-custom.tgz, then distributes this build to multiple
>>>>>>>>> machines, so that:
>>>>>>>>>
>>>>>>>>> 1. These machines can run spark with the built.
>>>>>>>>> 2. On each machine, I can install pyspark by running `python
>>>>>>>>> setup.py install` inside the python directory.
>>>>>>>>>
>>>>>>>>> Step 2 would fail because of missing the licenses directory.
>>>>>>>>>
>>>>>>>>> Building pyspark out of a binary distribution is a bit
>>>>>>>>> unconventional, but I did this after failing to do what the official 
>>>>>>>>> doc
>>>>>>>>> recommended (
>>>>>>>>> https://spark.apache.org/docs/latest/building-spark.html#pyspark-pip-installable),
>>>>>>>>> so taking a step back to describe what I did originally:
>>>>>>>>>
>>>>>>>>> In the spark-2.4.5 src directory, I just did a simple:
>>>>>>>>>
>>>>>>>>> `./build/mvn -DskipTests clean package`
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And then went to the python directory and did:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> `python setup.py sdist` followed by `pip install
>>>>>>>>> dist/pyspark-2.4.5.tar.gz` (as mentioned in the
>>>>>>>>> make-distribution.sh.)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This ran into "error: package directory `deps/jars` does
>>>>>>>>> not exist".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> However, directly running
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> `sudo python setup.py install`
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> worked.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 1, 2020 at 11:30 AM Sean Owen <sro...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The source distribution has the source LICENSE file. The binary
>>>>>>>>>> distribution has the LICENSE-binary license file. The source release 
>>>>>>>>>> isn't
>>>>>>>>>> supposed to have LICENSE-binary as it would not be accurate for that
>>>>>>>>>> release; LICENSE is. If you're redistributing a build, you'll have 
>>>>>>>>>> your own
>>>>>>>>>> process for modifying and building it, including modifying the 
>>>>>>>>>> LICENSE file
>>>>>>>>>> as appropriate; these LICENSE files represent what the project 
>>>>>>>>>> delivers to
>>>>>>>>>> you rather than what you deliver to others. You could get the
>>>>>>>>>> LICENSE-binary file from the right hash commit from git, if desired, 
>>>>>>>>>> as
>>>>>>>>>> part of your build.
>>>>>>>>>>
>>>>>>>>>> On Fri, May 1, 2020 at 10:19 AM Xiangyu Li <yisky...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I downloaded spark-2.4.5 source from
>>>>>>>>>>> https://mirrors.ocf.berkeley.edu/apache/spark/spark-2.4.5/spark-2.4.5.tgz
>>>>>>>>>>> After extracting it and running:
>>>>>>>>>>>
>>>>>>>>>>> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
>>>>>>>>>>> -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn 
>>>>>>>>>>> -Pkubernetes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It creates a Spark binary distribution named:
>>>>>>>>>>> spark-2.4.5-bin-custom-spark.tgz
>>>>>>>>>>>
>>>>>>>>>>> So this file is supposedly a ready-to-distribute Spark binary
>>>>>>>>>>> file like the one you can download from
>>>>>>>>>>> http://mirror.metrocast.net/apache/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
>>>>>>>>>>>
>>>>>>>>>>> However, one big difference between this custom build and the
>>>>>>>>>>> official build is that you do not have a LICENSE file in the custom 
>>>>>>>>>>> build.
>>>>>>>>>>> I don't know much about Apache license, but I would suppose a 
>>>>>>>>>>> custom build
>>>>>>>>>>> distribution should have one.
>>>>>>>>>>>
>>>>>>>>>>> The reason we are missing the file is caused by the following
>>>>>>>>>>> code in make-distribution.sh:
>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>
>>>>>>>>>>> There is no LICENSE-binary file in the official spark-2.4.5.tgz
>>>>>>>>>>> file, therefore there will be no LICENSE file in your custom build.
>>>>>>>>>>>
>>>>>>>>>>> I am aware of two pull requests related to this:
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/spark/pull/22436
>>>>>>>>>>> started to use LICENSE-binary instead of just the LICENSE.
>>>>>>>>>>>
>>>>>>>>>>> And
>>>>>>>>>>> https://github.com/apache/spark/pull/22840
>>>>>>>>>>> To avoid failure when there is no LICENSE-binary in spark-2.4.5
>>>>>>>>>>> source directory.
>>>>>>>>>>>
>>>>>>>>>>> I think we need to change make-distribution.sh to make sure that
>>>>>>>>>>> the LICENSE file is copied over to its corresponding custom build
>>>>>>>>>>> distribution. However, I am not ready to do a pull request, so 
>>>>>>>>>>> hopefully we
>>>>>>>>>>> can discuss it here first.
>>>>>>>>>>> --
>>>>>>>>>>> Sincerely
>>>>>>>>>>> Xiangyu Li
>>>>>>>>>>>
>>>>>>>>>>> <yisky...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sincerely
>>>>>>>>> Xiangyu Li
>>>>>>>>>
>>>>>>>>> <yisky...@gmail.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sincerely
>>>>>>> Xiangyu Li
>>>>>>>
>>>>>>> <yisky...@gmail.com>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Sincerely
>>>>> Xiangyu Li
>>>>>
>>>>> <yisky...@gmail.com>
>>>>>
>>>>
>>>
>>> --
>>> Sincerely
>>> Xiangyu Li
>>>
>>> <yisky...@gmail.com>
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>
> --
> Sincerely
> Xiangyu Li
>
> <yisky...@gmail.com>
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to