Spark 2.4.7

2023-08-25 Thread Harry Jamison
I am using python 3.7 and Spark 2.4.7
I am not sure what the best way to do this is.
I have a dataframe with a url in one of the columns, and I want to download the 
contents of that url and put it in a new column.
Can someone point me in the right direction on how to do this?I looked at the 
UDFs and they seem confusing to me.
Also, is there a good way to rate limit the number of calls I make per second?

Re: mysterious spark.sql.utils.AnalysisException Union in spark 3.3.2, but not seen in 3.4.0+

2023-08-25 Thread Mich Talebzadeh
Hi Srivastan,

Ground investigation

   1. Does this union explicitly exist in your code? If not, where are the
   7 and 6 column counting coming from?
   2. On 3.3.1 have you looked at spark UI and the relevant dag diagram
   3. Check query execution plan using explain() functionality
   4. Can you reproduce this error on 3.3.2 using a smaller sample of data
   and simplified query.
   5. Check  Spark 3.3.2 and 3.4 release notes for any relevant changes and
   bug fixes relevant to this case
   6. Have you reported this issue to the EMR user group?


HTH

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 25 Aug 2023 at 16:01, Srivatsan vn  wrote:

> Hello Users,
>
>I have been seeing some weird issues when I upgraded my
> EMR setup to 6.11 (which uses spark 3.3.2) , the call stack seems to point
> to a code location where there is no explicit union, also I have
> unionByName everywhere in the codebase with allowMissingColumns set to
> True. I suspect the reported union in the exception is probably inserted in
> the plan by spark optimizer?
>
> spark.sql.utils.AnalysisException: Union can only be performed on tables
> with the same number of columns, but the first table has 7 columns and the
> second table has 6 columns
>
> The issue seems to have disappeared when I did a quick test with spark
> 3.4.0 in my local setup, I am just curious if this is a known issue in
> spark user/dev community or if I am missing something.
>
>
> Thanks
>
> Srivatsan
>


mysterious spark.sql.utils.AnalysisException Union in spark 3.3.2, but not seen in 3.4.0+

2023-08-25 Thread Srivatsan vn
Hello Users,

   I have been seeing some weird issues when I upgraded my
EMR setup to 6.11 (which uses spark 3.3.2) , the call stack seems to point
to a code location where there is no explicit union, also I have
unionByName everywhere in the codebase with allowMissingColumns set to
True. I suspect the reported union in the exception is probably inserted in
the plan by spark optimizer?

spark.sql.utils.AnalysisException: Union can only be performed on tables
with the same number of columns, but the first table has 7 columns and the
second table has 6 columns

The issue seems to have disappeared when I did a quick test with spark
3.4.0 in my local setup, I am just curious if this is a known issue in
spark user/dev community or if I am missing something.


Thanks

Srivatsan


Unsubscribe

2023-08-25 Thread Dipayan Dev



Spark Connect: API mismatch in SparkSesession#execute

2023-08-25 Thread Stefan Hagedorn
Hi everyone,

I’m trying to use the “extension” feature of the Spark Connect CommandPlugin 
(Spark 3.4.1).

I created a simple protobuf message `MyMessage` that I want to send from the 
connect client-side to the connect server (where I registered my plugin).

The SparkSession class in `spark-connect-client-jvm` provides a method 
`execute` that accepts a `com.google.protobuf.Any`, so I packed the MyMessage 
object in an Any:

val spark = SparkSession.builder().remote("sc://localhost").build()

 val cmd = com.test.MyMessage.newBuilder().setBlubb("hello world").build()
 val googleAny = com.google.protobuf.Any.pack(cmd)

spark.execute(googleAny)


This compiles, but during execution I receive a NoSuchMethodError:
java.lang.NoSuchMethodError: 'void 
org.apache.spark.sql.SparkSession.execute(com.google.protobuf.Any)'

After looking around for a while I found that 
spark-connect-client-jvm_2.12-3.4.1.jar!SparkSession#execute accepts a 
`org.sparkproject.connect.client.com.google.protobuf.Any`.

Am I missing something? Is there an additional build step or should I use a 
specific plugin?


Thanks,
Stefan