Re: Graceful shutdown SPARK Structured Streaming

2023-02-08 Thread Brian Wylie
It's been a few years (so this approach might be out of date) but here's what I used for PySpark as part of this SO ( https://stackoverflow.com/questions/45717433/stop-structured-streaming-query-gracefully/65708677 ) ``` # Helper method to stop a streaming query def stop_stream_query(query,

Re: [Spark Structured Streaming] Processing the data path coming from kafka.

2021-01-18 Thread Brian Wylie
Coming in late.. but if I understand correctly, you can simply use the fact that spark.read (or readStream) will also accept a directory argument. If you provide a directory spark will automagically pull in all the files in that directory. """Reading in multiple files example""" spark =

pyspark histogram

2017-09-27 Thread Brian Wylie
Hi All, My google/SO searching is somehow failing on this I simply want to compute histograms for a column in a Spark dataframe. There are two SO hits on this question: - https://stackoverflow.com/questions/39154325/pyspark-show-histogram-of-a-data-frame-column -

RE: plotting/resampling timeseries data

2017-09-22 Thread Brian Wylie
@vermanuraq Great thanks, just what I needed.. I knew I was missing something simple. Cheers, -brian -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Python UDF to convert timestamps (performance question)

2017-08-30 Thread Brian Wylie
> Something like this *col("ts").cast("timestamp")* > > On Wed, Aug 30, 2017 at 11:45 AM, Brian Wylie <briford.wy...@gmail.com> > wrote: > >> Hi All, >> >> I'm using structured streaming in Spark 2.2. >> >> I'm usi

Python UDF to convert timestamps (performance question)

2017-08-30 Thread Brian Wylie
here's the questions: - Will the creation of a new dataframe withColumn basically kill performance? - Should I move my UDF into the parsed_data.select(...) part? - Can my UDF be done by spark.sql directly? (I tried to_timestamp but without luck) Any suggestions/pointers are greatly appreciated. -Brian Wylie

Re: PySpark, Structured Streaming and Kafka

2017-08-24 Thread Brian Wylie
e your Python file. > > On Wed, Aug 23, 2017 at 1:41 PM, Brian Wylie <briford.wy...@gmail.com> > wrote: > >> Hi All, >> >> I'm trying the new hotness of using Kafka and Structured Streaming. >> >> Resources that I've looked at >> - https://spark.apac

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Brian Wylie
.@databricks.com > wrote: > You can use `bin/pyspark --packages > org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0` > to start "pyspark". If you want to use "spark-submit", you also need to > provide your Python file. > > On Wed, Aug 23, 2017 at 1:

PySpark, Structured Streaming and Kafka

2017-08-23 Thread Brian Wylie
e.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:274) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) at org.apache.spark.launcher.Main.main(Main.java:86) Anyway, all my code/versions/etc are in this notebook: - https://github.com/Kitware/BroThon/blob/master/notebooks/Bro_to_Spark.ipynb I'd be tremendously appreciative of some super nice, smart person if they could point me in the right direction :) -Brian Wylie

Re: Question about 'Structured Streaming'

2017-08-08 Thread Brian Wylie
N support to > read bro logs, rather than a python library. This is likely to have much > better performance since we can do all of the parsing on the JVM without > having to flow it though an external python process. > > On Tue, Aug 8, 2017 at 9:35 AM, Brian Wylie <briford.wy...@gmai

Fwd: Python question about 'Structured Streaming'

2017-08-08 Thread Brian Wylie
Hi All, I've read the new information about Structured Streaming in Spark, looks super great. Resources that I've looked at - https://spark.apache.org/docs/latest/streaming-programming-guide.html - https://databricks.com/blog/2016/07/28/structured-streamin g-in-apache-spark.html -

Question about 'Structured Streaming'

2017-08-08 Thread Brian Wylie
Hi All, I've read the new information about Structured Streaming in Spark, looks super great. Resources that I've looked at - https://spark.apache.org/docs/latest/streaming-programming-guide.html - https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html -