Sorry I have no idea on Delta Lake. You may get a better answer from Delta
Lake mailing list.
One thing is clear that stateful processing is simply an essential feature
on almost every streaming framework. If you're struggling with something
around the state feature and trying to find a
Hi,
In the spark job, it exports to prometheus localhost http server, to be
later scraped by prometheus service.
(https://github.com/prometheus/client_java#http) The problem here is when
ssh to the emr instances themselves, only can see the metrics on (e.g. curl
localhost:9111) driver in local
Jungtaek,
How would you contrast stateful streaming with checkpoint vs. the idea of
writing updates to a Delta Lake table, and then using the Delta Lake table
as a streaming source for our state stream?
Thank you,
Bryan
On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh
wrote:
> Thank You
Hello,
I came across an issue[1] in PyHive which involves the SHOW TABLES output
from Thrift Server.
When you run a SHOW TABLES statement in beeline, it will return a table
with the following fields: (i) schema name, (ii) table name, (iii)
temporary table flag.
This output is different from
I am confused with your question. Are you running a the Spark cluster
on AWS EMR and trying to output the result to a Prometheus instance
running on your localhost? Isn't your localhost behind the firewall
and not accessible by AWS? What does it mean "have prometheus available
in
I'm trying to implement org.apache.spark.sql.expressions.Aggregator in java.
Both the input and output columns are arrays of strings. I can not figure
out how to construct a working encoder for the method outputEncoder() and
for the UDF registration. The data type on java side could be
Collection
I'm trying to implement org.apache.spark.sql.expressions.Aggregator in java.
Both the input and output columns are arrays of strings. I can not figure
out how to construct a working encoder for the method outputEncoder() and
for the UDF registration. The data type on java side could be
Collection