Re: Merging Parquet Files

2020-09-03 Thread Michael Segel
Hi, I think you’re asking the right question, however you’re making an assumption that he’s on the cloud and he never talked about the size of the file. It could be that he’s got a lot of small-ish data sets. 1GB is kinda small in relative terms. Again YMMV. Personally if you’re going

Re: Submitting Spark Job thru REST API?

2020-09-03 Thread Eric Beabes
Thank you all for your responses. Will try them out. On Thu, Sep 3, 2020 at 12:06 AM tianlangstudio wrote: > Hello, Eric > Maybe you can use Spark JobServer 0.10.0 > https://github.com/spark-jobserver/spark-jobserverl > We used this with Spark 1.6, and it is awesome. You know > the project is

Spark Streaming Checkpointing

2020-09-03 Thread András Kolbert
Hi All, I have a Spark streaming application (2.4.4, Kafka 0.8 >> so Spark Direct Streaming) running just fine. I create a context in the following way: ssc = StreamingContext(sc, 60) opts = {"metadata.broker.list":kafka_hosts,"auto.offset.reset": "largest", "group.id": run_type} kvs =

回复:Submitting Spark Job thru REST API?

2020-09-03 Thread tianlangstudio
Hello, Eric Maybe you can use Spark JobServer 0.10.0 https://github.com/spark-jobserver/spark-jobserverl We used this with Spark 1.6, and it is awesome. You know the project is still very active. So highly recommend it to you Fusion Zhu