Re: SparkStructured Streaming using withWatermark - TypeError: 'module' object is not callable

2022-02-16 Thread karan alang
Hi Mich, the issue was related to incorrect, which is resolved. However, wrt your comment - 'OK sounds like your watermark is done outside of your processing.' In my use-case which primarily deals with syslogs, syslog is a string which needs to be parsed (with defensive coding built in to ensure

Re: Cast int to string not possible?

2022-02-16 Thread Rico Bergmann
Here is the code snippet: var df = session.read().parquet(basepath); for(Column partition : partitionColumnsList){ df = df.withColumn(partition.getName(), df.col(partition.getName()).cast(partition.getType())); } Column is a class containing Schema Information, like for example the name of

Re: Cast int to string not possible?

2022-02-16 Thread Morven Huang
Hi Rico, you have any code snippet? I have no problem casting int to string. > 2022年2月17日 上午12:26,Rico Bergmann 写道: > > Hi! > > I am reading a partitioned dataFrame into spark using automatic type > inference for the partition columns. For one partition column the data > contains an integer,

[Spark SQL] Is there any free ODBC driver

2022-02-16 Thread Rostyslav Myroshnychenko
Hi, We are trying to leverage the spark SQL as a federated query engine but we struggle to find any odbc driver that doesn't require a subscription / license. Has anyone seen a free driver? Thanks, Rostyslav

Deploying docker images in Google Kubernetes engines

2022-02-16 Thread Mich Talebzadeh
Hi friends, Please note that you cannot use any docker image in Google Kubernetes cluster. It needs to be in a given format! Look at this article of mine mentioned below for building docker images. Try to standardize the image with The version of Spark, The Scala version, The Java version,

Cast int to string not possible?

2022-02-16 Thread Rico Bergmann
Hi! I am reading a partitioned dataFrame into spark using automatic type inference for the partition columns. For one partition column the data contains an integer, therefor Spark uses IntegerType for this column. In general this is supposed to be a StringType column. So I tried to cast this

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread Mich Talebzadeh
Are you going to terminate the spark process if the queue is empty? view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread Gourav Sengupta
Hi, >From a technical perspective I think that we all agree, there are no arguments. >From a design/ architecture point of view, given that big data was supposed to solve design challenges on volume, velocity, veracity, and variety, and companies usually investing in data solutions build them to

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread Sean Owen
There's nothing wrong with calling microservices this way. Something needs to call the service with all the data arriving, and Spark is fine for executing arbitrary logic including this kind of thing. Kafka does not change that? On Wed, Feb 16, 2022 at 9:24 AM Gourav Sengupta wrote: > Hi, >

Re: Which manufacturers' GPUs support Spark?

2022-02-16 Thread Gourav Sengupta
Hi, 100% agree with Sean, the entire RAPIDS solution is built by wonderful people from NVIDIA. Just out of curiosity, if you are using AWS then EMR already supports RAPIDS, please try to use that. AWS has cheaper GPU's which can be used for testing solutions. For certain operations on SPARK the

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread Gourav Sengupta
Hi, once again, just trying to understand the problem first. Why are we using SPARK to place calls to micro services? There are several reasons why this should never happen, including costs/ security/ scalability concerns, etc. Is there a way that you can create a producer and put the data into

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
No I want the job to stop and end once it discovers on repeated retries that the microservice is not responding. But I think I got where you were going right after sending my previous mail. Basically repeatedly failing of your tasks on retries ultimately fails your job anyway. So thats an in-built

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread Sean Owen
You stop the Spark job by tasks failing repeatedly, that's already how it works. You can't kill the driver from the executor other ways, but should not need to. I'm not clear, you're saying you want to stop the job, but also continue processing? On Wed, Feb 16, 2022 at 7:58 AM S wrote: >

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
Retries have been already implemented. The question is how to stop the spark job by having an executor JVM send a signal to the driver JVM. e.g. I have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say while a partition of 10 messages was being processed, first 3 went through

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread Sean Owen
You could use the same pattern in your flatMap function. If you want Spark to keep retrying though, you don't need any special logic, that is what it would do already. You could increase the number of task retries though; see the spark.excludeOnFailure.task.* configurations. You can just

Re: Which manufacturers' GPUs support Spark?

2022-02-16 Thread Sean Owen
Spark itself does not use GPUs, and is agnostic to what GPUs exist on a cluster, scheduled by the resource manager, and used by an application. In practice, virtually all GPU-related use cases (for deep learning for example) use CUDA, and this is NVIDIA-specific. Certainly, RAPIDS is from NVIDIA.

restoring SQL text from logical plan

2022-02-16 Thread Wang Cheng
I??m implementing the materialized feature for Spark. I have built a customized listener that logs the logical plan and physical plan of each sql query. After some analysis, I can get the most valuable subtree that needs to be materialized. Then I need to restore the subtree of the plan back to

Which manufacturers' GPUs support Spark?

2022-02-16 Thread 15927907...@163.com
Hello, We have done some Spark GPU accelerated work using the spark-rapids component(https://github.com/NVIDIA/spark-rapids). However, we found that this component currently only supports Nvidia GPU, and on the official Spark website, we did not see the manufacturer's description of the GPU

Position for 'cf.content' not found in row

2022-02-16 Thread 潘明文
HI, Could you help me the below issue,Thanks! This is my source code: SparkConf sparkConf = new SparkConf(true); sparkConf.setAppName(ESTest.class.getName()); SparkSession spark = null; sparkConf.setMaster("local[*]"); sparkConf.set("spark.cleaner.ttl", "3600"); sparkConf.set("es.nodes",

Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
Hi, We have a spark job that calls a microservice in the lambda function of the flatmap transformation -> passes to this microservice, the inbound element in the lambda function and returns the transformed value or "None" from the microservice as an output of this flatMap transform. Of course

Re: SparkStructured Streaming using withWatermark - TypeError: 'module' object is not callable

2022-02-16 Thread Mich Talebzadeh
OK sounds like your watermark is done outside of your processing. Check this # construct a streaming dataframe streamingDataFrame that subscribes to topic temperature streamingDataFrame = self.spark \ .readStream \ .format("kafka") \