I think what George is looking for is a way to determine ahead of time the partition IDs that Spark will use when writing output.
George, I believe this is an example of what you're looking for: https://github.com/databricks/spark-redshift/blob/184b4428c1505dff7b4365963dc344197a92baa9/src/main/scala/com/databricks/spark/redshift/RedshiftWriter.scala#L240-L257 Specifically, the part that says "TaskContext.get.partitionId()". I don't know how much of that is part of Spark's public API, but there it is. It would be useful if Spark offered a way to get a manifest of output files for any given write operation, similar to Redshift's MANIFEST option <https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html>. This would help when, for example, you need to pass a list of files output by Spark to some other system (like Redshift) and don't want to have to worry about the consistency guarantees of your object store's list operations. Nick On Fri, Sep 25, 2020 at 2:00 PM EveLiao <evelia...@gmail.com> wrote: > If I understand your problem correctly, the prefix you provided is actually > "0000-" + UUID. You can get it by uuid generator like > https://docs.python.org/3/library/uuid.html#uuid.uuid4. > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >