Re: Extract value from streaming Dataframe to a variable

2020-01-21 Thread Nick Dawes
each output of micro-batch: > > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch > > Hope this helps. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Mon, Jan 20, 2020 at 8:43 PM Nick Dawes wrote: > &

Re: Extract value from streaming Dataframe to a variable

2020-01-20 Thread Nick Dawes
Streaming experts, any clues how to achieve this? After extracting few variables, I need to run them through a REST API for verification and decision making. Thanks for your help. Nick On Fri, Jan 17, 2020, 6:27 PM Nick Dawes wrote: > I need to extract a value from a PySpark structu

Extract value from streaming Dataframe to a variable

2020-01-17 Thread Nick Dawes
I need to extract a value from a PySpark structured streaming Dataframe to a string variable to check something. I tried this code. agentName = kinesisDF.select(kinesisDF.agentName.getItem(0).alias("agentName")).collect()[0][0] This works on a non-streaming Dataframe only. In a streaming

Re: Structured Streaming Dataframe Size

2019-08-28 Thread Nick Dawes
incrementally to update the result, and then discards the >> source data. It only keeps around the minimal intermediate *state* data >> as required to update the result (e.g. intermediate counts in the earlier >> example). >> > > > On Tue, Aug 27, 2019 at 1:21 PM Nick Dawe

Structured Streaming Dataframe Size

2019-08-27 Thread Nick Dawes
I have a quick newbie question. Spark Structured Streaming creates an unbounded dataframe that keeps appending rows to it. So what's the max size of data it can hold? What if the size becomes bigger than the JVM? Will it spill to disk? I'm using S3 as storage. So will it write temp data on S3 or

Spark Structured Streaming XML content

2019-08-14 Thread Nick Dawes
I'm trying to analyze data using Kinesis source in PySpark Structured Streaming on Databricks. Ceeated a Dataframe as shown below. kinDF = spark.readStream.format("kinesis").("streamName", "test-stream-1").load() Converted the data from base64 encoding as below. df =

Re: Spark Image resizing

2019-07-31 Thread Nick Dawes
Any other way of resizing the image before creating the DataFrame in Spark? I know opencv does it. But I don't have opencv on my cluster. I have Anaconda python packages installed on my cluster. Any ideas will be appreciated. Thank you! On Tue, Jul 30, 2019, 4:17 PM Nick Dawes wrote: >

Spark Image resizing

2019-07-30 Thread Nick Dawes
Hi I'm new to spark image data source. After creating a dataframe using Spark's image data source, I would like to resize the images in PySpark. df = spark.read.format("image").load(imageDir) Can you please help me with this? Nick

Spark on Kubernetes Authentication error

2019-06-06 Thread Nick Dawes
Hi there, I'm trying to run Spark on EKS. Created an EKS cluster, added nodes and then trying to submit a Spark job from an EC2 instance. Ran following commands for access. kubectl create serviceaccount spark kubectl create clusterrolebinding spark-role --clusterrole=admin