Hello,
Executing "SELECT Muon_Pt FROM rootDF LIMIT 10", where "rootDF" is a temp
view backed by a DSv2 reader yields the attached plan [1]. It appears that
the initial stage is run over every partition in rootDF, even though each
partition has 200k rows (modulo the last partition which holds the
Hi Bryan,
Thanks for the interest! Unfortunately there's lack of support on
committers for SPARK-28190 (I have been struggling with lack of support on
structured streaming contributions). I hope things will get better, but in
the meantime, could you please try out my own project instead?
Hello Apache Spark Support Team,
I am writing Spark on Java now. I use Dataset API and I face with an issue,
that I am doing something like that
public Dataset> groupByKey(Dataset> consumers, Class kClass) {
consumers.groupBy("_1").agg(collect_list(col("_2"))).printSchema();
return
Hello Apache Spark Support Team,
I am writing Spark on Java now. I use Dataset API and I face with an issue,
that I am doing something like that
public Dataset> groupByKey(Dataset> consumers, Class kClass) {
consumers.groupBy("_1").agg(collect_list(col("_2"))).printSchema();
return
Hi, Jungtaek.
We've been investigating the use of Spark Structured Streaming to replace
our Spark Streaming operations. We have several cases where we're using
mapWithState to maintain state across batches, often with high volumes of
data. We took a look at the Structured Streaming stateful
Hello,
I'm trying to build a spark+hadoop docker image compatible with Openshift.
I've used oshinko Spark build script here
https://github.com/radanalyticsio/openshift-spark
to build something with Hadoop jar in classpath to allow usage of S3 storage.
However I'm now stuk on the spark