each output of micro-batch:
>
> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch
>
> Hope this helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> On Mon, Jan 20, 2020 at 8:43 PM Nick Dawes wrote:
>
&
Streaming experts, any clues how to achieve this?
After extracting few variables, I need to run them through a REST API for
verification and decision making.
Thanks for your help.
Nick
On Fri, Jan 17, 2020, 6:27 PM Nick Dawes wrote:
> I need to extract a value from a PySpark structu
I need to extract a value from a PySpark structured streaming Dataframe to
a string variable to check something.
I tried this code.
agentName =
kinesisDF.select(kinesisDF.agentName.getItem(0).alias("agentName")).collect()[0][0]
This works on a non-streaming Dataframe only. In a streaming
incrementally to update the result, and then discards the
>> source data. It only keeps around the minimal intermediate *state* data
>> as required to update the result (e.g. intermediate counts in the earlier
>> example).
>>
>
>
> On Tue, Aug 27, 2019 at 1:21 PM Nick Dawe
I have a quick newbie question.
Spark Structured Streaming creates an unbounded dataframe that keeps
appending rows to it.
So what's the max size of data it can hold? What if the size becomes bigger
than the JVM? Will it spill to disk? I'm using S3 as storage. So will it
write temp data on S3 or
I'm trying to analyze data using Kinesis source in PySpark Structured
Streaming on Databricks.
Ceeated a Dataframe as shown below.
kinDF = spark.readStream.format("kinesis").("streamName",
"test-stream-1").load()
Converted the data from base64 encoding as below.
df =
Any other way of resizing the image before creating the DataFrame in Spark?
I know opencv does it. But I don't have opencv on my cluster. I have
Anaconda python packages installed on my cluster.
Any ideas will be appreciated. Thank you!
On Tue, Jul 30, 2019, 4:17 PM Nick Dawes wrote:
>
Hi
I'm new to spark image data source.
After creating a dataframe using Spark's image data source, I would like to
resize the images in PySpark.
df = spark.read.format("image").load(imageDir)
Can you please help me with this?
Nick
Hi there,
I'm trying to run Spark on EKS. Created an EKS cluster, added nodes and
then trying to submit a Spark job from an EC2 instance.
Ran following commands for access. kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=admin