from:"Simon Kitching"

Re: Reading Hive tables Parallel in Spark

2017-07-17 Thread Simon Kitching

Have you tried simply making a list with your tables in it, then using SparkContext.makeRDD(Seq)? ie val tablenames = List("table1", "table2", "table3", ...) val tablesRDD = sc.makeRDD(tablenames, nParallelTasks) tablesRDD.foreach() > Am 17.07.2017 um 14:12 schrieb FN

Re: Glue-like Functionality

2017-07-10 Thread Simon Kitching

Sounds similar to Confluent Kafka Schema Registry and Kafka Connect. The Schema Registry and Kafka Connect themselves are open-source, but some of the datasource-specific adapters, and GUIs to manage it all, are not open-source (see Confluent Enterprise Edition). Note that the Schema Registry