That error sounds like it's from pandas not spark. Are you sure it's this line?
On Thu, Feb 23, 2023, 12:57 PM Oliver Ruebenacker < oliv...@broadinstitute.org> wrote: > > Hello, > > I'm trying to calculate the distance between a gene (with start and end) > and a variant (with position), so I joined gene and variant data by > chromosome and then tried to calculate the distance like this: > > ``` > distances = joined.withColumn("distance", max(col("start") - > col("position"), col("position") - col("end"), 0)) > ``` > > Basically, the distance is the maximum of three terms. > > This line causes an obscure error: > > ``` > ValueError: Cannot convert column into bool: please use '&' for 'and', '|' > for 'or', '~' for 'not' when building DataFrame boolean expressions. > ``` > > How can I do this? Thanks! > > Best, Oliver > > -- > Oliver Ruebenacker, Ph.D. (he) > Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, > Flannick > Lab <http://www.flannicklab.org/>, Broad Institute > <http://www.broadinstitute.org/> >