Does DataSet/DataFrame support ReduceBy() as RDD does?

2019-06-24 Thread Qian He
i found RDD.reduceBy() is really useful and much more efficient than groupBy(). Wondering if DS/DF have the similar apis?

Re: [Meta] Moderation request diversion?

2019-06-24 Thread Vadim Semenov
just set up a filter [image: Screen Shot 2019-06-24 at 4.51.20 PM.png] On Mon, Jun 24, 2019 at 4:46 PM Jeff Evans wrote: > There seem to be a lot of people trying to unsubscribe via the main > address, rather than following the instructions from the welcome > email. Of course, this is not all

[Meta] Moderation request diversion?

2019-06-24 Thread Jeff Evans
There seem to be a lot of people trying to unsubscribe via the main address, rather than following the instructions from the welcome email. Of course, this is not all that surprising, but it leads to a lot of pointless threads*. Is there a way to enable automatic detection and diversion of such

Re: RE - Apache Spark compatibility with Hadoop 2.9.2

2019-06-24 Thread Bipul kumar
Thank you for the clarification. Respectfully, Bipul PUBLIC KEY 97F0 2E08 7DE7 D538 BDFA B708 86D8 BE27 8196 D466 ** Please excuse brevity and typos. ** On Mon, Jun 24, 2019 at 4:24 AM Mark Bidewell wrote: > Note that we selected Spark

unsubscribe

2019-06-24 Thread Dave Moyers

Spark locking Hive partition

2019-06-24 Thread Artur Sukhenko
Hi, I have Spark streaming app(1m batch) writing parquet data to a partition e.g. val hdfsPath = s"$dbPath/$tableName/year=$year/month=$month/day=$day" df.write.mode(SaveMode.Append).parquet(hdfsPath) I wonder would I lose data if I overwrite this partition with Hive (compaction/deduplication)

unsubscribe

2019-06-24 Thread Song Yang