Re: Time to start publishing Spark Docker Images?

Andrew Melo Tue, 17 Aug 2021 12:44:14 -0700

Hi Mich,

By default, pip caches downloaded binaries to somewhere like
$HOME/.cache/pip. So after doing any "pip install", you'll want to either
delete that directory, or pass the "--no-cache-dir" option to pip to
prevent the download binaries from being added to the image.


HTH
Andrew

On Tue, Aug 17, 2021 at 2:29 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi Andrew,
>
> Can you please elaborate on blowing pip cache before committing the layer?
>
> Thanks,
>
> Much
>
> On Tue, 17 Aug 2021 at 16:57, Andrew Melo <andrew.m...@gmail.com> wrote:
>
>> Silly Q, did you blow away the pip cache before committing the layer?
>> That always trips me up.
>>
>> Cheers
>> Andrew
>>
>> On Tue, Aug 17, 2021 at 10:56 Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> With no additional python packages etc we get 1.4GB compared to 2.19GB
>>> before
>>>
>>> REPOSITORY       TAG                                      IMAGE ID
>>>  CREATED                  SIZE
>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8only   faee4dbb95dd
>>>  Less than a second ago   1.41GB
>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8       ba3c17bc9337
>>>  4 hours ago              2.19GB
>>>
>>> root@233a81199b43:/opt/spark/work-dir# pip list
>>> Package       Version
>>> ------------- -------
>>> asn1crypto    0.24.0
>>> cryptography  2.6.1
>>> entrypoints   0.3
>>> keyring       17.1.1
>>> keyrings.alt  3.1.1
>>> pip           21.2.4
>>> pycrypto      2.6.1
>>> PyGObject     3.30.4
>>> pyxdg         0.25
>>> SecretStorage 2.3.1
>>> setuptools    57.4.0
>>> six           1.12.0
>>> wheel         0.32.3
>>>
>>>
>>> HTH
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 17 Aug 2021 at 16:24, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Yes, I will double check. it includes java 8 in addition to base java
>>>> 11.
>>>>
>>>> in addition it has these Python packages for now (added for my own
>>>> needs for now)
>>>>
>>>> root@ce6773017a14:/opt/spark/work-dir# pip list
>>>> Package       Version
>>>> ------------- -------
>>>> asn1crypto    0.24.0
>>>> cryptography  2.6.1
>>>> cx-Oracle     8.2.1
>>>> entrypoints   0.3
>>>> keyring       17.1.1
>>>> keyrings.alt  3.1.1
>>>> numpy         1.21.2
>>>> pip           21.2.4
>>>> py4j          0.10.9
>>>> pycrypto      2.6.1
>>>> PyGObject     3.30.4
>>>> pyspark       3.1.2
>>>> pyxdg         0.25
>>>> PyYAML        5.4.1
>>>> SecretStorage 2.3.1
>>>> setuptools    57.4.0
>>>> six           1.12.0
>>>> wheel         0.32.3
>>>>
>>>>
>>>> HTH
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, 17 Aug 2021 at 16:17, Maciej <mszymkiew...@gmail.com> wrote:
>>>>
>>>>> Quick question ‒ is this actual output? If so, do we know what
>>>>> accounts 1.5GB overhead for PySpark image. Even without
>>>>> --no-install-recommends this seems like a lot (if I recall correctly
>>>>> it was around 400MB for existing images).
>>>>>
>>>>>
>>>>> On 8/17/21 2:24 PM, Mich Talebzadeh wrote:
>>>>>
>>>>> Examples:
>>>>>
>>>>> *docker images*
>>>>>
>>>>> REPOSITORY       TAG                                  IMAGE ID
>>>>>  CREATED          SIZE
>>>>>
>>>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8   ba3c17bc9337   2
>>>>> minutes ago    2.19GB
>>>>>
>>>>> spark            3.1.1-scala_2.12-java11              4595c4e78879
>>>>>  18 minutes ago   635MB
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 17 Aug 2021 at 10:31, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> 3.1.2_sparkpy_3.7-scala_2.12-java11
>>>>>>
>>>>>> 3.1.2_sparkR_3.6-scala_2.12-java11
>>>>>> Yes let us go with that and remember that we can change the tags
>>>>>> anytime. The accompanying release note should detail what is inside the
>>>>>> image downloaded.
>>>>>>
>>>>>> +1 for me
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 17 Aug 2021 at 09:51, Maciej <mszymkiew...@gmail.com> wrote:
>>>>>>
>>>>>>> On 8/17/21 4:04 AM, Holden Karau wrote:
>>>>>>>
>>>>>>> These are some really good points all around.
>>>>>>>
>>>>>>> I think, in the interest of simplicity, well start with just the 3
>>>>>>> current Dockerfiles in the Spark repo but for the next release (3.3) we
>>>>>>> should explore adding some more Dockerfiles/build options.
>>>>>>>
>>>>>>> Sounds good.
>>>>>>>
>>>>>>> However, I'd consider adding guest lang version to the tag names,
>>>>>>> i.e.
>>>>>>>
>>>>>>> 3.1.2_sparkpy_3.7-scala_2.12-java11
>>>>>>>
>>>>>>> 3.1.2_sparkR_3.6-scala_2.12-java11
>>>>>>>
>>>>>>> and some basics safeguards in the layers, to make sure that these
>>>>>>> are really the versions we use.
>>>>>>>
>>>>>>> On Mon, Aug 16, 2021 at 10:46 AM Maciej <mszymkiew...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have a few concerns regarding PySpark and SparkR images.
>>>>>>>>
>>>>>>>> First of all, how do we plan to handle interpreter versions?
>>>>>>>> Ideally, we should provide images for all supported variants, but 
>>>>>>>> based on
>>>>>>>> the preceding discussion and the proposed naming convention, I assume 
>>>>>>>> it is
>>>>>>>> not going to happen. If that's the case, it would be great if we could 
>>>>>>>> fix
>>>>>>>> interpreter versions based on some support criteria (lowest supported,
>>>>>>>> lowest non-deprecated, highest supported at the time of release, etc.)
>>>>>>>>
>>>>>>>> Currently, we use the following:
>>>>>>>>
>>>>>>>>    - for R use buster-cran35 Debian repositories which install R
>>>>>>>>    3.6 (provided version already changed in the past and broke image 
>>>>>>>> build ‒
>>>>>>>>    SPARK-28606).
>>>>>>>>    - for Python we depend on the system provided python3 packages,
>>>>>>>>    which currently provides Python 3.7.
>>>>>>>>
>>>>>>>> which don't guarantee stability over time and might be hard to
>>>>>>>> synchronize with our support matrix.
>>>>>>>>
>>>>>>>> Secondly, omitting libraries which are required for the full
>>>>>>>> functionality and performance, specifically
>>>>>>>>
>>>>>>>>    - Numpy, Pandas and Arrow for PySpark
>>>>>>>>    - Arrow for SparkR
>>>>>>>>
>>>>>>>> is likely to severely limit usability of the images (out of these,
>>>>>>>> Arrow is probably the hardest to manage, especially when you already 
>>>>>>>> depend
>>>>>>>> on system packages to provide R or Python interpreter).
>>>>>>>>
>>>>>>>> On 8/14/21 12:43 AM, Mich Talebzadeh wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We can cater for multiple types (spark, spark-py and spark-r) and
>>>>>>>> spark versions (assuming they are downloaded and available).
>>>>>>>> The challenge is that these docker images built are snapshots. They
>>>>>>>> cannot be amended later and if you change anything by going inside 
>>>>>>>> docker,
>>>>>>>> as soon as you are logged out whatever you did is reversed.
>>>>>>>>
>>>>>>>> For example, I want to add tensorflow to my docker image. These are
>>>>>>>> my images
>>>>>>>>
>>>>>>>> REPOSITORY                                TAG           IMAGE ID
>>>>>>>>    CREATED         SIZE
>>>>>>>> eu.gcr.io/axial-glow-224522/spark-py      java8_3.1.1
>>>>>>>>  cfbb0e69f204   5 days ago      2.37GB
>>>>>>>> eu.gcr.io/axial-glow-224522/spark         3.1.1
>>>>>>>>  8d1bf8e7e47d   5 days ago      805MB
>>>>>>>>
>>>>>>>> using image ID I try to log in as root to the image
>>>>>>>>
>>>>>>>> *docker run -u0 -it cfbb0e69f204 bash*
>>>>>>>>
>>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# pip install keras
>>>>>>>> Collecting keras
>>>>>>>>   Downloading keras-2.6.0-py2.py3-none-any.whl (1.3 MB)
>>>>>>>>      |████████████████████████████████| 1.3 MB 1.1 MB/s
>>>>>>>> Installing collected packages: keras
>>>>>>>> Successfully installed keras-2.6.0
>>>>>>>> WARNING: Running pip as the 'root' user can result in broken
>>>>>>>> permissions and conflicting behaviour with the system package manager. 
>>>>>>>> It
>>>>>>>> is recommended to use a virtual environment instead:
>>>>>>>> https://pip.pypa.io/warnings/venv
>>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# pip list
>>>>>>>> Package       Version
>>>>>>>> ------------- -------
>>>>>>>> asn1crypto    0.24.0
>>>>>>>> cryptography  2.6.1
>>>>>>>> cx-Oracle     8.2.1
>>>>>>>> entrypoints   0.3
>>>>>>>> *keras         2.6.0      <--- it is here*
>>>>>>>> keyring       17.1.1
>>>>>>>> keyrings.alt  3.1.1
>>>>>>>> numpy         1.21.1
>>>>>>>> pip           21.2.3
>>>>>>>> py4j          0.10.9
>>>>>>>> pycrypto      2.6.1
>>>>>>>> PyGObject     3.30.4
>>>>>>>> pyspark       3.1.2
>>>>>>>> pyxdg         0.25
>>>>>>>> PyYAML        5.4.1
>>>>>>>> SecretStorage 2.3.1
>>>>>>>> setuptools    57.4.0
>>>>>>>> six           1.12.0
>>>>>>>> wheel         0.32.3
>>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# exit
>>>>>>>>
>>>>>>>> Now I exited from the image and try to log in again
>>>>>>>> (pyspark_venv) hduser@rhes76: /home/hduser/dba/bin/build> docker
>>>>>>>> run -u0 -it cfbb0e69f204 bash
>>>>>>>>
>>>>>>>> root@5231ee95aa83:/opt/spark/work-dir# pip list
>>>>>>>> Package       Version
>>>>>>>> ------------- -------
>>>>>>>> asn1crypto    0.24.0
>>>>>>>> cryptography  2.6.1
>>>>>>>> cx-Oracle     8.2.1
>>>>>>>> entrypoints   0.3
>>>>>>>> keyring       17.1.1
>>>>>>>> keyrings.alt  3.1.1
>>>>>>>> numpy         1.21.1
>>>>>>>> pip           21.2.3
>>>>>>>> py4j          0.10.9
>>>>>>>> pycrypto      2.6.1
>>>>>>>> PyGObject     3.30.4
>>>>>>>> pyspark       3.1.2
>>>>>>>> pyxdg         0.25
>>>>>>>> PyYAML        5.4.1
>>>>>>>> SecretStorage 2.3.1
>>>>>>>> setuptools    57.4.0
>>>>>>>> six           1.12.0
>>>>>>>> wheel         0.32.3
>>>>>>>>
>>>>>>>> *Hm that keras is not there*. The docker Image cannot be altered
>>>>>>>> after build! So once the docker image is created that is just a 
>>>>>>>> snapshot.
>>>>>>>> However, it will still have tons of useful stuff for most
>>>>>>>> users/organisations. My suggestions is to create for a given type 
>>>>>>>> (spark,
>>>>>>>> spark-py etc):
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. One vanilla flavour for everyday use with few useful packages
>>>>>>>>    2. One for medium use with most common packages for ETL/ELT
>>>>>>>>    stuff
>>>>>>>>    3. One specialist for ML etc with keras, tensorflow and
>>>>>>>>    anything else needed
>>>>>>>>
>>>>>>>>
>>>>>>>> These images should be maintained as we currently maintain spark
>>>>>>>> releases with accompanying documentation. Any reason why we cannot 
>>>>>>>> maintain
>>>>>>>> ourselves?
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 13 Aug 2021 at 17:26, Holden Karau <hol...@pigscanfly.ca>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> So we actually do have a script that does the build already it's
>>>>>>>>> more a matter of publishing the results for easier use. Currently the
>>>>>>>>> script produces three images spark, spark-py, and spark-r. I can 
>>>>>>>>> certainly
>>>>>>>>> see a solid reason to publish like with a jdk11 & jdk8 suffix as well 
>>>>>>>>> if
>>>>>>>>> there is interest in the community. If we want to have a say
>>>>>>>>> spark-py-pandas for a Spark container image with everything necessary 
>>>>>>>>> for
>>>>>>>>> the Koalas stuff to work then I think that could be a great PR from 
>>>>>>>>> someone
>>>>>>>>> to add :)
>>>>>>>>>
>>>>>>>>> On Fri, Aug 13, 2021 at 1:00 AM Mich Talebzadeh <
>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> should read PySpark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, 13 Aug 2021 at 08:51, Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Agreed.
>>>>>>>>>>>
>>>>>>>>>>> I have already built a few latest for Spark and PYSpark on 3.1.1
>>>>>>>>>>> with Java 8 as I found out Java 11 does not work with Google 
>>>>>>>>>>> BigQuery data
>>>>>>>>>>> warehouse. However, to hack the Dockerfile one finds out the hard 
>>>>>>>>>>> way.
>>>>>>>>>>>
>>>>>>>>>>> For example how to add additional Python libraries like
>>>>>>>>>>> tensorflow etc. Loading these libraries through Kubernetes is not 
>>>>>>>>>>> practical
>>>>>>>>>>> as unzipping and installing it through --py-files etc will
>>>>>>>>>>> take considerable time so they need to be added to the dockerfile 
>>>>>>>>>>> at the
>>>>>>>>>>> built time in directory for Python under Kubernetes
>>>>>>>>>>>
>>>>>>>>>>> /opt/spark/kubernetes/dockerfiles/spark/bindings/python
>>>>>>>>>>>
>>>>>>>>>>> RUN pip install pyyaml numpy cx_Oracle tensorflow ....
>>>>>>>>>>>
>>>>>>>>>>> Also you will need curl to test the ports from inside the docker
>>>>>>>>>>>
>>>>>>>>>>> RUN apt-get update && apt-get install -y curl
>>>>>>>>>>> RUN ["apt-get","install","-y","vim"]
>>>>>>>>>>>
>>>>>>>>>>> As I said I am happy to build these specific dockerfiles plus
>>>>>>>>>>> the complete documentation for it. I have already built one for 
>>>>>>>>>>> Google
>>>>>>>>>>> (GCP). The difference between Spark and PySpark version is that in
>>>>>>>>>>> Spark/scala a fat jar file will contain all needed. That is not the 
>>>>>>>>>>> case
>>>>>>>>>>> with Python I am afraid.
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>>> other
>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>> content is
>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 13 Aug 2021 at 08:13, Bode, Meikel, NMA-CFD <
>>>>>>>>>>> meikel.b...@bertelsmann.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am Meikel Bode and only an interested reader of dev and user
>>>>>>>>>>>> list. Anyway, I would appreciate to have official docker images 
>>>>>>>>>>>> available.
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe one could get inspiration from the Jupyter docker stacks
>>>>>>>>>>>> and provide an hierarchy of different images like this:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Having a core image only supporting Java, an extended
>>>>>>>>>>>> supporting Python and/or R etc.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Looking forward to the discussion.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>
>>>>>>>>>>>> Meikel
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>>>>>>>>> *Sent:* Freitag, 13. August 2021 08:45
>>>>>>>>>>>> *Cc:* dev <dev@spark.apache.org>
>>>>>>>>>>>> *Subject:* Re: Time to start publishing Spark Docker Images?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I concur this is a good idea and certainly worth exploring.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In practice, preparing docker images as deployable will throw
>>>>>>>>>>>> some challenges because creating docker for Spark  is not really a 
>>>>>>>>>>>> singular
>>>>>>>>>>>> modular unit, say  creating docker for Jenkins. It involves 
>>>>>>>>>>>> different
>>>>>>>>>>>> versions and different images for Spark and PySpark and most 
>>>>>>>>>>>> likely will
>>>>>>>>>>>> end up as part of Kubernetes deployment.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Individuals and organisations will deploy it as the first cut.
>>>>>>>>>>>> Great but I equally feel that good documentation on how to build a
>>>>>>>>>>>> consumable deployable image will be more valuable.  FRom my own 
>>>>>>>>>>>> experience
>>>>>>>>>>>> the current documentation should be enhanced, for example how to 
>>>>>>>>>>>> deploy
>>>>>>>>>>>> working directories, additional Python packages, build with 
>>>>>>>>>>>> different Java
>>>>>>>>>>>> versions  (version 8 or version 11) etc.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> HTH
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0CkL3HZo9FNVUOnLQ4CYs29Z9HfrwE4xDqLgVmMbr10%3D&reserved=0>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>>>> other
>>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>>> content is
>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 13 Aug 2021 at 01:54, Holden Karau <
>>>>>>>>>>>> hol...@pigscanfly.ca> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Awesome, I've filed an INFRA ticket to get the ball rolling.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:48 PM John Zhuge <jzh...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:44 PM Hyukjin Kwon <
>>>>>>>>>>>> gurwls...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +1, I think we generally agreed upon having it. Thanks Holden
>>>>>>>>>>>> for headsup and driving this.
>>>>>>>>>>>>
>>>>>>>>>>>> +@Dongjoon Hyun <dongj...@apache.org> FYI
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2021년 7월 22일 (목) 오후 12:22, Kent Yao <yaooq...@gmail.com>님이 작성:
>>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Bests,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Kent Yao*
>>>>>>>>>>>>
>>>>>>>>>>>> @ Data Science Center, Hangzhou Research Institute, NetEase
>>>>>>>>>>>> Corp.
>>>>>>>>>>>>
>>>>>>>>>>>> *a spark* *enthusiast*
>>>>>>>>>>>>
>>>>>>>>>>>> *kyuubi
>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fkyuubi&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZkE%2BAK4%2BUO9JsDzZlAfY5gsATCVm5hidLCp7EGxAWiY%3D&reserved=0>**is
>>>>>>>>>>>> a unified* *multi-tenant* *JDBC interface for large-scale data
>>>>>>>>>>>> processing and analytics,* *built on top of* *Apache Spark
>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>*
>>>>>>>>>>>> *.*
>>>>>>>>>>>> *spark-authorizer
>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-authorizer&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P6TMaSh7UeXVyv79RiRqdBpipaIjh2o3DhRs0GGhWF4%3D&reserved=0>**A
>>>>>>>>>>>> Spark SQL extension which provides SQL Standard Authorization for*
>>>>>>>>>>>>
>>>>>>>>>>>> --
>> It's dark in this basement.
>>
> --
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Time to start publishing Spark Docker Images?

Reply via email to