Murphy's Law striking after asking the question, I just discovered the
solution:
The jdbc url should set the zeroDateTimeBehavior option.
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html
Hi,
We have a legacy process of scraping a MySQL Database. The Spark job uses
the DataFrame API and MySQL JDBC driver to read the tables and save them as
JSON files. One table has DateTime columns that contain values invalid for
java.sql.Timestamp so it's throwing the exception:
Hi everyone!
I just published this blog post on how Spark Scala custom transformations
can be re-arranged to better be composed and used within .transform:
https://medium.com/@dmateusp/dataframe-transform-spark-function-composition-eb8ec296c108
I found the discussions in this group to be
I'm trying to run a sample code that reads a file from s3 so I need the aws
sdk and aws hadoop dependencies.
If I assemble these deps into the main jar everything works fine. But when I
try using --packages, the deps are not seen by the pods.
This is my submit command:
spark-submit
--master
Nice to hear you're investigating the issue deeply.
Btw, if attaching code is not easy, maybe you could share logical/physical
plan on any batch: "detail" in SQL tab would show up the plan as string.
Plans from sequential batches would be much helpful - and streaming query
status in these batch
Hi Jungtaek
Thanks for your response!
I actually have set watermarks on all the streams A/B/C with the respective
event time
column A/B/C_LAST_MOD. So I think this should not be the reason.
Of course, the event time on the C stream (the "optional one") progresses much
slower
than on the other
Hi All,
Is there a best practice around calculating daily, weekly, monthly,
quarterly, yearly active users?
One approach is to create a window of daily bitmap and aggregate it based
on period later. However I was wondering if anyone has a better approach to
tackling this problem..
--
Regards,
When using osx, it is recommended to install java, scala and spark using
brew.
Run these commands on a terminal:
brew update
brew install scala
brew install sbt
brew cask install java
brew install spark
There is no need to install HDFS, you can use your local file system
without a
Hi all,
Recently I run './build/mvn test' of spark on aarch64, and master and
branch-2.4 are all failled, the log pieces as below:
..
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.spark.util.kvstore.LevelDBTypeInfoSuite
[INFO] Tests
I would suspect that rows are never evicted in state in second join. To
determine whether the row is NOT matched to other side, Spark should check
whether the row is ever matched before evicted. You need to set watermark
either B_LAST_MOD or C_LAST_MOD.
If you already did but not exposed to here,
Hi all,
Recently I run './build/mvn test' of spark on aarch64, and master and
branch-2.4 are all failled, the log pieces as below:
..
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.spark.util.kvstore.LevelDBTypeInfoSuite
[INFO] Tests
11 matches
Mail list logo