I believe you can "SELECT version()" in Spark SQL to see the build version.

On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks for the detailed info.
>
> I was hoping that one can find a simpler answer to the Spark version than
> doing forensic examination on base code so to speak.
>
> The primer for this verification is that on GCP dataprocs originally built
> on 3.11-rc2, there was an issue with running Spark Structured Streaming
> (SSS) which I reported to this forum before.
>
> After a while and me reporting to Google, they have now upgraded the base
> to Spark 3.1.1 itself. I am not privy to how they did the upgrade itself.
>
> In the meantime we installed 3.1.1 on-premise and ran it with the same
> Python code for SSS. It worked fine.
>
> However, when I run the same code on GCP dataproc upgraded to 3.1.1,
> occasionally I see this error
>
> 21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue:
> Listener EventLoggingListener threw an exception
>
> java.util.ConcurrentModificationException
>
>         at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)
>
> This may be for other reasons or the consequence of upgrading from
> 3.1.1-rc2 to 3.11?
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 20 Mar 2021 at 22:41, Attila Zsolt Piros <
> piros.attila.zs...@gmail.com> wrote:
>
>> Hi!
>>
>> I would check out the Spark source then diff those two RCs (first just
>> take look to the list of the changed files):
>>
>> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat
>> ...
>>
>> The shell scripts in the release can be checked very easily:
>>
>> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".sh "
>>  bin/docker-image-tool.sh                           |   6 +-
>>  dev/create-release/release-build.sh                |   2 +-
>>
>> We are lucky as *docker-image-tool.sh* is part of the released version.
>> Is it from v3.1.1-rc2 or v3.1.1-rc1?
>>
>> Of course this only works if docker-image-tool.sh is not changed from
>> the v3.1.1-rc2 back to v3.1.1-rc1.
>> So let's continue with the python (and latter with R) files:
>>
>> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".py "
>>  python/pyspark/sql/avro/functions.py               |   4 +-
>>  python/pyspark/sql/dataframe.py                    |   1 +
>>  python/pyspark/sql/functions.py                    | 285 +++++------
>>  .../pyspark/sql/tests/test_pandas_cogrouped_map.py |  12 +
>>  python/pyspark/sql/tests/test_pandas_map.py        |   8 +
>> ...
>>
>> After you have enough proof you can stop (to decide what is enough here
>> should be decided by you).
>> Finally you can use javap / scalap on the classes from the jars and check
>> some code changes which is more harder to be analyzed than a simple text
>> file.
>>
>> Best Regards,
>> Attila
>>
>>
>> On Thu, Mar 18, 2021 at 4:09 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> What would be a signature in Spark version or binaries that confirms the
>>> release is built on Spark built on 3.1.1 as opposed to 3.1.1-RC-1 or RC-2?
>>>
>>> Thanks
>>>
>>> Mich
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>

Reply via email to