Re: Spark version verification

Attila Zsolt Piros Sun, 21 Mar 2021 09:20:04 -0700

Hi!

Thanks Sean and Kent! By reading your answers I have also learnt something
new.


@Mich Talebzadeh <mich.talebza...@gmail.com>: see the commit  content by
prefixing it with *https://github.com/apache/spark/commit/
<https://github.com/apache/spark/commit/>*.
So in your case
https://github.com/apache/spark/commit/1d550c4e90275ab418b9161925049239227f3dc9

Best Regards,
Attila

On Sun, Mar 21, 2021 at 5:02 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> Hi Kent,
>
> Thanks for the links.
>
> You have to excuse my ignorance, what are the correlations among these
> links and the ability to establish a spark build version?
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 21 Mar 2021 at 15:55, Kent Yao <yaooq...@qq.com> wrote:
>
>> Please refer to
>> http://spark.apache.org/docs/latest/api/sql/index.html#version
>>
>> *Kent Yao *
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>> *a spark enthusiast*
>> *kyuubi <https://github.com/yaooqinn/kyuubi>is a
>> unified multi-tenant JDBC interface for large-scale data processing and
>> analytics, built on top of Apache Spark <http://spark.apache.org/>.*
>> *spark-authorizer <https://github.com/yaooqinn/spark-authorizer>A Spark
>> SQL extension which provides SQL Standard Authorization for **Apache
>> Spark <http://spark.apache.org/>.*
>> *spark-postgres <https://github.com/yaooqinn/spark-postgres> A library
>> for reading data from and transferring data to Postgres / Greenplum with
>> Spark SQL and DataFrames, 10~100x faster.*
>> *spark-func-extras <https://github.com/yaooqinn/spark-func-extras>A
>> library that brings excellent and useful functions from various modern
>> database management systems to Apache Spark <http://spark.apache.org/>.*
>>
>>
>>
>> On 03/21/2021 23:28，Mich Talebzadeh<mich.talebza...@gmail.com>
>> <mich.talebza...@gmail.com> wrote：
>>
>> Many thanks
>>
>> spark-sql> SELECT version();
>> 3.1.1 1d550c4e90275ab418b9161925049239227f3dc9
>>
>> What does 1d550c4e90275ab418b9161925049239227f3dc9 signify please?
>>
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 21 Mar 2021 at 15:14, Sean Owen <sro...@gmail.com> wrote:
>>
>>> I believe you can "SELECT version()" in Spark SQL to see the build
>>> version.
>>>
>>> On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Thanks for the detailed info.
>>>>
>>>> I was hoping that one can find a simpler answer to the Spark version
>>>> than doing forensic examination on base code so to speak.
>>>>
>>>> The primer for this verification is that on GCP dataprocs originally
>>>> built on 3.11-rc2, there was an issue with running Spark Structured
>>>> Streaming (SSS) which I reported to this forum before.
>>>>
>>>> After a while and me reporting to Google, they have now upgraded the
>>>> base to Spark 3.1.1 itself. I am not privy to how they did the upgrade
>>>> itself.
>>>>
>>>> In the meantime we installed 3.1.1 on-premise and ran it with the same
>>>> Python code for SSS. It worked fine.
>>>>
>>>> However, when I run the same code on GCP dataproc upgraded to 3.1.1,
>>>> occasionally I see this error
>>>>
>>>> 21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue:
>>>> Listener EventLoggingListener threw an exception
>>>>
>>>> java.util.ConcurrentModificationException
>>>>
>>>>         at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)
>>>>
>>>> This may be for other reasons or the consequence of upgrading from
>>>> 3.1.1-rc2 to 3.11?
>>>>
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, 20 Mar 2021 at 22:41, Attila Zsolt Piros <
>>>> piros.attila.zs...@gmail.com> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I would check out the Spark source then diff those two RCs (first just
>>>>> take look to the list of the changed files):
>>>>>
>>>>> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat
>>>>> ...
>>>>>
>>>>> The shell scripts in the release can be checked very easily:
>>>>>
>>>>> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".sh "
>>>>>  bin/docker-image-tool.sh                           |   6 +-
>>>>>  dev/create-release/release-build.sh                |   2 +-
>>>>>
>>>>> We are lucky as *docker-image-tool.sh* is part of the released
>>>>> version.
>>>>> Is it from v3.1.1-rc2 or v3.1.1-rc1?
>>>>>
>>>>> Of course this only works if docker-image-tool.sh is not changed from
>>>>> the v3.1.1-rc2 back to v3.1.1-rc1.
>>>>> So let's continue with the python (and latter with R) files:
>>>>>
>>>>> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".py "
>>>>>  python/pyspark/sql/avro/functions.py               |   4 +-
>>>>>  python/pyspark/sql/dataframe.py                    |   1 +
>>>>>  python/pyspark/sql/functions.py                    | 285 +++++------
>>>>>  .../pyspark/sql/tests/test_pandas_cogrouped_map.py |  12 +
>>>>>  python/pyspark/sql/tests/test_pandas_map.py        |   8 +
>>>>> ...
>>>>>
>>>>> After you have enough proof you can stop (to decide what is enough
>>>>> here should be decided by you).
>>>>> Finally you can use javap / scalap on the classes from the jars and
>>>>> check some code changes which is more harder to be analyzed than a simple
>>>>> text file.
>>>>>
>>>>> Best Regards,
>>>>> Attila
>>>>>
>>>>>
>>>>> On Thu, Mar 18, 2021 at 4:09 PM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> What would be a signature in Spark version or binaries that confirms
>>>>>> the release is built on Spark built on 3.1.1 as opposed to 3.1.1-RC-1 or
>>>>>> RC-2?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Mich
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: Spark version verification

Reply via email to