date:20200330

Optimizing LIMIT in DSv2

2020-03-30 Thread Andrew Melo

Hello, Executing "SELECT Muon_Pt FROM rootDF LIMIT 10", where "rootDF" is a temp view backed by a DSv2 reader yields the attached plan [1]. It appears that the initial stage is run over every partition in rootDF, even though each partition has 200k rows (modulo the last partition which holds the

Re: Data Source - State (SPARK-28190)

2020-03-30 Thread Jungtaek Lim

Hi Bryan, Thanks for the interest! Unfortunately there's lack of support on committers for SPARK-28190 (I have been struggling with lack of support on structured streaming contributions). I hope things will get better, but in the meantime, could you please try out my own project instead?

[Spark SQL]: How to deserailize column of ArrayType to java.util.List

2020-03-30 Thread Dima Pavlyshyn

Hello Apache Spark Support Team, I am writing Spark on Java now. I use Dataset API and I face with an issue, that I am doing something like that public Dataset> groupByKey(Dataset> consumers, Class kClass) { consumers.groupBy("_1").agg(collect_list(col("_2"))).printSchema(); return

[no subject]

2020-03-30 Thread Dima Pavlyshyn

Hello Apache Spark Support Team, I am writing Spark on Java now. I use Dataset API and I face with an issue, that I am doing something like that public Dataset> groupByKey(Dataset> consumers, Class kClass) { consumers.groupBy("_1").agg(collect_list(col("_2"))).printSchema(); return

Data Source - State (SPARK-28190)

2020-03-30 Thread Bryan Jeffrey

Hi, Jungtaek. We've been investigating the use of Spark Structured Streaming to replace our Spark Streaming operations. We have several cases where we're using mapWithState to maintain state across batches, often with high volumes of data. We took a look at the Structured Streaming stateful

Building Spark + hadoop docker for openshift

2020-03-30 Thread Antoine DUBOIS

Hello, I'm trying to build a spark+hadoop docker image compatible with Openshift. I've used oshinko Spark build script here https://github.com/radanalyticsio/openshift-spark to build something with Hadoop jar in classpath to allow usage of S3 storage. However I'm now stuk on the spark

Optimizing LIMIT in DSv2

Re: Data Source - State (SPARK-28190)

[Spark SQL]: How to deserailize column of ArrayType to java.util.List

[no subject]

Data Source - State (SPARK-28190)

Building Spark + hadoop docker for openshift

6 matches

Site Navigation

Mail list logo

Footer information