date:20181229

Count() not working on streaming dataframe/structured streaming

2018-12-29 Thread Ritesh Shah

I have written this simple code to try streaming aggregation in spark 2.4. Somehow, job keeps running but not returning any result. It returns me 3 columns JobType, Timestamp and TS if I remove groupby and count aggregation function. Would really appreciate any help. val edgeDF = spark

RE: What are the alternatives to nested DataFrames?

2018-12-29 Thread email

1 - I am not sure how can I do what you suggest for #1 because I use the entries in the initial df to build the query and then from it I get the second df. Could you explain more? 2 - I also thought about doing what you consider in #2 , but if I am not mistaken If I use regular Scala data

Postgres Read JDBC with COPY TO STDOUT

2018-12-29 Thread Nicolas Paris

Hi The spark postgres JDBC reader is limited because it relies on basic SELECT statements with fetchsize and crashes on large tables even if multiple partitions are setup with lower/upper bounds. I am about writing a new postgres JDBC reader based on "COPY TO STDOUT". It would stream the data

Re: [spark-sql] Hive failing on insert empty array into parquet table

2018-12-29 Thread 李斌松

https://issues.apache.org/jira/browse/HIVE-13632 李斌松于2018年12月29日周六下午4:08写道： > Hive has fixed this problem, which is not fixed in > hive-exec-1.2.1.spark2.jar > > [image: image.png] > >