Re: Query around Spark Checkpoints

2020-09-29 Thread Jungtaek Lim
Sorry I have no idea on Delta Lake. You may get a better answer from Delta Lake mailing list. One thing is clear that stateful processing is simply an essential feature on almost every streaming framework. If you're struggling with something around the state feature and trying to find a

Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

2020-09-29 Thread christinegong
Hi, In the spark job, it exports to prometheus localhost http server, to be later scraped by prometheus service. (https://github.com/prometheus/client_java#http) The problem here is when ssh to the emr instances themselves, only can see the metrics on (e.g. curl localhost:9111) driver in local

Re: Query around Spark Checkpoints

2020-09-29 Thread Bryan Jeffrey
Jungtaek, How would you contrast stateful streaming with checkpoint vs. the idea of writing updates to a Delta Lake table, and then using the Delta Lake table as a streaming source for our state stream? Thank you, Bryan On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh wrote: > Thank You

Should SHOW TABLES statement return a hive-compatible output?

2020-09-29 Thread Ricardo Martinelli de Oliveira
Hello, I came across an issue[1] in PyHive which involves the SHOW TABLES output from Thrift Server. When you run a SHOW TABLES statement in beeline, it will return a table with the following fields: (i) schema name, (ii) table name, (iii) temporary table flag. This output is different from

Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

2020-09-29 Thread Artemis User
I am confused with your question.  Are you running a the Spark cluster on AWS EMR and trying to output the result to a Prometheus instance running on your localhost?   Isn't your localhost behind the firewall and not accessible by AWS?  What does it mean "have prometheus available in

[SQL] How to get an encoder for string array in java?

2020-09-29 Thread tanelk
I'm trying to implement org.apache.spark.sql.expressions.Aggregator in java. Both the input and output columns are arrays of strings. I can not figure out how to construct a working encoder for the method outputEncoder() and for the UDF registration. The data type on java side could be Collection

[SQL] How to get an encoder for string array in java?

2020-09-29 Thread tanelk
I'm trying to implement org.apache.spark.sql.expressions.Aggregator in java. Both the input and output columns are arrays of strings. I can not figure out how to construct a working encoder for the method outputEncoder() and for the UDF registration. The data type on java side could be Collection