Hi All,

I have some data output source which can only be written to by a specific 
Python API. For that I am (ab)using foreachPartition(writing_func) from PySpark 
which works pretty well.

I wonder if its possible to somehow update the task metrics - specifically 
setBytesWritten - at the end of every partition. On the surface it seems 
impossible to me, for 2 reasons:

  1.  I don't think there is an open py4j gateway in a task context
  2.  TaskMetrics is accessed via ThreadLocal, so even with an open gateway I 
don't think it'll hit the specific thread


Does anyone know of existing solution or a workaround?


Thanks

Shay

Reply via email to