Hello All,
I've a GCP Dataproc cluster, and I'm running a Spark StructuredStreaming
job on this.
I'm trying to use KafkaProducer to push aggregated data into a Kafka
topic, however when i import KafkaProducer
(from kafka import KafkaProducer),
it gives error
```
Traceback (most recent call
I found the reason why it did not work:
When returning the Spark data type I was calling new StringType(). When
changing it to DataTypes.StringType it worked.
Greets,
Rico.
> Am 17.02.2022 um 14:13 schrieb Gourav Sengupta :
>
>
> Hi,
>
> can you please post a screen shot of the exact
Hello,
I am working on optimising the performance of a Java ML/NLP application
based on Spark / SparkNLP. For prediction, I am applying a trained model
on a Spark dataset which consists of one column with only one row. The
dataset is created like this:
List textList =
HI,
Could you help me the below issue,Thanks!
This is my source code:
SparkConf sparkConf = new SparkConf(true);
sparkConf.setAppName(ESTest.class.getName());
SparkSession spark = null;
sparkConf.setMaster("local[*]");
sparkConf.set("spark.cleaner.ttl", "3600");
sparkConf.set("es.nodes",
Hello All,
I've a pyspark dataframe which i need to write to Kafka topic.
Structure of the DF is :
root
|-- window: struct (nullable = true)
||-- start: timestamp (nullable = false)
||-- end: timestamp (nullable = false)
|-- processedAlarmCnt: integer (nullable = false)
|--
Just a create directory as below on gcp storage bucket
CODE_DIRECTORY_CLOUD="gs://spark-on-k8s/codes/"
Put your jar file there
gsutil cp /opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar
$CODE_DIRECTORY_CLOUD
--conf spark.kubernetes.file.upload.path=file:///tmp \
Please try these two corrections:
1. The --packages isn't the right command line argument for
spark-submit. Please use --conf spark.jars.packages=your-package to
specify Maven packages or define your configuration parameters in
the spark-defaults.conf file
2. Please check the version
Though I have created the kubernetes RBAC as per Spark site in my GKE
cluster,Im getting POD NAME null error.
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit
--serviceaccount=default:spark --namespace=default
On Thu, Feb 17, 2022 at 11:31 PM
Hi Mich
This is the latest error I'm stuck with. Please help me resolve this issue.
Exception in thread "main"
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]
for kind: [Pod] with name: [null] in namespace: [default] failed.
Hi,
The following excellent documentation may help as well:
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch
The book from Dr. Zaharia on SPARK does a fantastic job in explaining the
fundamental thinking behind these concepts.
Hi Gnana,
That JAR file /home/gnana_kumar123/spark/spark-3.2.1-
bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar, is not visible
to the GKE cluster such that all nodes can read it. I suggest that you put
it on gs:// bucket in GCP and access it from there.
HTH
view my Linkedin
Hi,
can you please post a screen shot of the exact CAST statement that you are
using? Did you use the SQL method mentioned by me earlier?
Regards,
Gourav Sengupta
On Thu, Feb 17, 2022 at 12:17 PM Rico Bergmann wrote:
> hi!
>
> Casting another int column that is not a partition column fails
Hi There,
I'm getting below error though I pass --class and --jars values
while submitting a spark job through Spark-Submit.
Please help.
Exception in thread "main" org.apache.spark.SparkException: Failed to get
main class in JAR with error 'File file:/home/gnana_kumar123/spark/ does
not
hi!
Casting another int column that is not a partition column fails with the same
error.
The Schema before the cast (column names are anonymized):
root
|-- valueObject: struct (nullable = true)
||-- value1: string (nullable = true)
||-- value2: string (nullable = true)
||--
OK, that sounds reasonable.
In the code below
#Aggregation code in Alarm call, which uses withWatermark
def computeCount(df_processedAlarm, df_totalAlarm):
processedAlarmCnt = None
if df_processedAlarm.count() > 0:
processedAlarmCnt =
Can you try to cast any other Int field which is NOT a partition column?
On Thu, 17 Feb 2022 at 7:34 pm, Gourav Sengupta
wrote:
> Hi,
>
> This appears interesting, casting INT to STRING has never been an issue
> for me.
>
> Can you just help us with the output of : df.printSchema() ?
>
> I
Hi,
This appears interesting, casting INT to STRING has never been an issue for
me.
Can you just help us with the output of : df.printSchema() ?
I prefer to use SQL, and the method I use for casting is: CAST(<> AS STRING) <>.
Regards,
Gourav
On Thu, Feb 17, 2022 at 6:02 AM Rico Bergmann
17 matches
Mail list logo