Re: Spark Structured Streaming Continuous Trigger Mode React to an External Trigger

2021-03-29 Thread shahrajesh2006
I tried to create a Dataset by loading a file and pass that as argument to java method as below: Dataset propertiesFile// Dataset created by loading a json property file Dataset streamingQuery // Dataset for streaming query streamingQuery.map( row -> myfunction( row, propertiesFile),

Re: Ubuntu 18.04: Docker: start-master.sh: command not found

2021-03-29 Thread Mich Talebzadeh
OK just do cat ~/.profile and send the content please also do echo $PATH and send the output as well. HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or

Re: Ubuntu 18.04: Docker: start-master.sh: command not found

2021-03-29 Thread GUINKO Ferdinand
Hi Talebzadeh, thank you for your answer. I am sorry, but I don't  undertand what I need to do. It seems that I already put SPARK_HOME/sbin in my path as I have this line among the 3 lines I have sent in my initial email: echo "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" >> ~/.profile

Re: Ubuntu 18.04: Docker: start-master.sh: command not found

2021-03-29 Thread Mich Talebzadeh
In my case export SPARK_HOME=/d4T/hduser/spark-3.1.1-bin-hadoop3.2 export PATH=$SPARK_HOME/bin:*$SPARK_HOME/sbin*:$PATH which start-master.sh /d4T/hduser/spark-3.1.1-bin-hadoop3.2/sbin/start-master.sh You need to add *$SPARK_HOME/sbin* to your path as well HTH view my Linkedin profile

Ubuntu 18.04: Docker: start-master.sh: command not found

2021-03-29 Thread GUINKO Ferdinand
Hi, I have installed Docker, Spark 3.1.1 and Hadoop 2.7 on Ubuntu 18.04. After I have executed the following 3 lines echo "export SPARK_HOME=/opt/spark" >> ~/.profile echo "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" >> ~/.profile echo "export PYSPARK_PYTHON=/usr/bin/python3" >>

How to gracefully shutdown spark job on kubernetes

2021-03-29 Thread Sachit Murarka
Hi All, I am using Spak 3.0.1 with Python 3.8 . I am using spark.stop() in the end to gracefully shutdown the job once the processing is done. But my job keeps running and it is giving following exception in every 5 mins. Can someone please help on this? 21/03/29 17:46:39 WARN

Re: Error Message Suggestion

2021-03-29 Thread Sean Owen
Sure, just open a pull request? On Mon, Mar 29, 2021 at 10:37 AM Josh Herzberg wrote: > Hi, > > I'd like to suggest this change to the PySpark code. I haven't contributed > before so https://spark.apache.org/contributing.html suggested emailing > here first. > > In the error raised here >

Error Message Suggestion

2021-03-29 Thread Josh Herzberg
Hi, I'd like to suggest this change to the PySpark code. I haven't contributed before so https://spark.apache.org/contributing.html suggested emailing here first. In the error raised here https://github.com/apache/spark/blob/b2bfe985e8adf55e5df5887340fd862776033a06/python/pyspark/worker.py#L141,

Re: The trigger interval in spark structured streaming

2021-03-29 Thread Mich Talebzadeh
On the subject of workload management, the usual thing to watch is Processing Time + Reserved Capacity < Batch Interval We are aware of Batch Interval, i.e. the rate at which the upstream source sends messages through Kafka. We can start by assuming that the rate of increase in the number of

Source.getBatch and schema vs qe.analyzed.schema?

2021-03-29 Thread Jacek Laskowski
Hi, I've been developing a data source with a source and sink for Spark Structured Streaming. I've got a question about Source.getBatch [1]: def getBatch(start: Option[Offset], end: Offset): DataFrame getBatch returns a streaming DataFrame between the offsets so the idiom (?) is to have a