Tathagata,
Thanks, your explanation was great.
The suggestion worked well with the only minutia is that I needed to have
the TS field brought in as a DoubleType() or the time got truncated.
Thanks again,
-Brian
On Wed, Aug 30, 2017 at 1:34 PM, Tathagata Das
1. Generally, adding columns, etc. will not affect performance because the
Spark's optimizer will automatically figure out columns that are not needed
and eliminate in the optimization step. So that should never be a concern.
2. Again, this is generally not a concern as the optimizer will take
Hi All,
I'm using structured streaming in Spark 2.2.
I'm using PySpark and I have data (from a Kafka publisher) where the
timestamp is a float that looks like this: 1379288667.631940
So here's my code (which is working fine)
# SUBSCRIBE: Setup connection to Kafka Stream
raw_data =