Hello All,

I have a Structured Streaming pyspark program running on GCP Dataproc,
which reads data from Kafka, and does some data massaging, and aggregation.
I'm trying to use withWatermark(), and it is giving error.

py4j.Py4JException: An exception was raised by the Python Proxy. Return
Message: Traceback (most recent call last):

  File
"/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line
2442, in _call_proxy

    return_value = getattr(self.pool[obj_id], method)(*params)

  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line
196, in call

    raise e

  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line
193, in call

    self.func(DataFrame(jdf, self.sql_ctx), batch_id)

  File
"/tmp/178d0ac9c82e42a09942f7f9cdc76bb7/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
line 444, in convertToDictForEachBatch

    ap = Alarm(tdict, spark)

  File
"/tmp/178d0ac9c82e42a09942f7f9cdc76bb7/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
line 356, in __init__

    computeCount(l_alarm_df, l_alarm1_df)

  File
"/tmp/178d0ac9c82e42a09942f7f9cdc76bb7/StructuredStreaming_GCP_Versa_Sase_gcloud.py",
line 262, in computeCount

    window(col("timestamp"), "10 minutes").alias("window")

TypeError: 'module' object is not callable

Details are in stackoverflow below :
https://stackoverflow.com/questions/71137296/structuredstreaming-withwatermark-typeerror-module-object-is-not-callable

Any ideas on how to debug/fix this ?
tia !

Reply via email to