Unable to handle bignumeric datatype in spark/pyspark

2023-02-23 Thread nidhi kher
Hello, I am facing below issue in pyspark code: We are running spark code using dataproc serverless batch in google cloud platform. Spark code is causing issue while writing the data to bigquery table. In bigquery table , few of the columns have datatype as bignumeric and spark code is changing

unsubscribe

2023-02-23 Thread Roberto Jr
please unsubscribe from that email list. thank you in advance. roberto.

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Sean Owen
That's pretty impressive. I'm not sure it's quite right - not clear that the intent is taking a minimum of absolute values (is it? that'd be wild). But I think it might have pointed in the right direction. I'm not quite sure why that error pops out, but I think 'max' is the wrong function. That's

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Bjørn Jørgensen
I'm trying to learn how to use chatgpt for coding. So after a lite chat I got this. The code you provided seems to calculate the distance between a gene and a variant by finding the maximum value between the difference of the variant position and the gene start position, the difference of the

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Russell Jurney
Usually, the solution to these problems is to do less per line, break it out and perform each minute operation as a field, then combine those into a final answer. Can you do that here? Thanks, Russell Jurney @rjurney russell.jur...@gmail.com LI

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Oliver Ruebenacker
Here is the complete error: ``` Traceback (most recent call last): File "nearest-gene.py", line 74, in main() File "nearest-gene.py", line 62, in main distances = joined.withColumn("distance", max(col("start") - col("position"), col("position") - col("end"), 0)) File

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Sean Owen
That error sounds like it's from pandas not spark. Are you sure it's this line? On Thu, Feb 23, 2023, 12:57 PM Oliver Ruebenacker < oliv...@broadinstitute.org> wrote: > > Hello, > > I'm trying to calculate the distance between a gene (with start and end) > and a variant (with position),

[PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Oliver Ruebenacker
Hello, I'm trying to calculate the distance between a gene (with start and end) and a variant (with position), so I joined gene and variant data by chromosome and then tried to calculate the distance like this: ``` distances = joined.withColumn("distance", max(col("start") -