never mind, I think pyspark is already doing async socket read / write, but on scala side in PythonRDD.scala
On Sat, Feb 6, 2016 at 6:27 PM, Renyi Xiong <renyixio...@gmail.com> wrote: > Hi, > > is it a good idea to have 2 threads in pyspark worker? - main thread > responsible for receive and send data over socket while the other thread is > calling user functions to process data? > > since CPU is idle (?) during network I/O, this should improve concurrency > quite a bit. > > can expert answer the question? what are the pros and cons here? > > thanks, > Renyi. > > >