Re: Spark 2.3.0 DataFrame.write.parquet() behavior change from 2.2.0

2018-05-07 Thread Yuanjian Li
Yea What’s the scenario you want the empty partitions configurable? Do you still need empty files? > 在 2018年5月8日,03:35,Victor Tso-Guillen > 写道: > > Found it: SPARK-21435 > > On Mon, May 7, 2018 at 2:18 PM Victor Tso-Guillen

Best place to persist offsets into Zookeeper

2018-05-07 Thread ravidspark
Hi All, I have the below problem in Spark Kafka steaming. Environment: Spark-2.2.0 Problem: We have written our own logic for offset management in zookeeper when streaming data with Spark + Kafka. Everything is working fine and we are able to control the offset commitment to zookeeper during

Re: Spark 2.3.0 DataFrame.write.parquet() behavior change from 2.2.0

2018-05-07 Thread Victor Tso-Guillen
Found it: SPARK-21435 On Mon, May 7, 2018 at 2:18 PM Victor Tso-Guillen wrote: > It appears that between 2.2.0 and 2.3.0 DataFrame.write.parquet() skips > writing empty parquet files for empty partitions. Is this configurable? Is > there a Jira that tracks this change? > >

Spark 2.3.0 DataFrame.write.parquet() behavior change from 2.2.0

2018-05-07 Thread Victor Tso-Guillen
It appears that between 2.2.0 and 2.3.0 DataFrame.write.parquet() skips writing empty parquet files for empty partitions. Is this configurable? Is there a Jira that tracks this change? Thanks, Victor

Re: Pickling Keras models for use in UDFs

2018-05-07 Thread erp12
Great idea! That works perfectly, thank you! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Guava dependency issue

2018-05-07 Thread Stephen Boesch
I am intermittently running into guava dependency issues across mutiple spark projects. I have tried maven shade / relocate but it does not resolve the issues. The current project is extremely simple: *no* additional dependencies beyond scala, spark, and scalatest - yet the issues remain (and

Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava wrote: > I've found a KVStore wrapper which stores all the metrics in a LevelDb > store. This KVStore wrapper is available as a spark-dependency but we cannot > access the metrics directly from spark since they are

Re: Advice on multiple streaming job

2018-05-07 Thread Dhaval Modi
Hi Gerard, Our source is kafka, and we are using standard streaming api (DStreams). Our requirement is, as we have 100's of kafka topics, Each topic sends different messages in JSON (complex) format. Topics structured are as per domain. Hence, each topic is independent of each other. These JSON

Watch Zookeeper in Spark Closure

2018-05-07 Thread 王 纯超
Hi, I want to listen to Zookeeper node change in Spark closure to change behavior dynamically. So I use Class PathChildrenCache provided by Curator. But the changes are not captured somehow. main function in the closure public class DpiScenarioBasedFilter implements FlatMapFunction

Re: Advice on multiple streaming job

2018-05-07 Thread Gerard Maas
Dhaval, Which Streaming API are you using? In Structured Streaming, you are able to start several streaming queries within the same context. kind regards, Gerard. On Sun, May 6, 2018 at 7:59 PM, Dhaval Modi wrote: > Hi Susan, > > Thanks for your response. > > Will try