FYI: Filed https://issues.apache.org/jira/browse/SPARK-24466 and provided the patch https://github.com/apache/spark/pull/21497
2018년 6월 5일 (화) 오전 11:30, Jungtaek Lim <kabh...@gmail.com>님이 작성: > Yeah that's why I initiated this thread, especially socket source is > expected to be used from examples on official document or some experiments, > which we tend to simply use netcat. > > I'll file an issue and provide the fix. > > 2018년 6월 5일 (화) 오전 1:48, Joseph Torres <joseph.tor...@databricks.com>님이 > 작성: > >> I tend to agree that this is a bug. It's kinda silly that nc does this, >> but a socket connector that doesn't work with netcat will surely seem >> broken to users. It wouldn't be a huge change to defer opening the socket >> until a read is actually required. >> >> On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim <kabh...@gmail.com> wrote: >> >>> Hi devs, >>> >>> Not sure I can hear back the response sooner since Spark summit is just >>> around the corner, but just would want to post and wait. >>> >>> While playing with Spark 2.4.0-SNAPSHOT, I found nc command exits before >>> reading actual data so the query also exits with error. >>> >>> The reason is due to launching temporary reader for reading schema, and >>> closing reader, and re-opening reader. While reliable socket server should >>> be able to handle this without any issue, nc command normally can't handle >>> multiple connections and simply exits when closing temporary reader. >>> >>> I would like to file an issue and contribute on fixing this if we think >>> this is a bug (otherwise we need to replace nc utility with another one, >>> maybe our own implementation?), but not sure we are happy to apply >>> workaround for specific source. >>> >>> Would like to hear opinions before giving a shot. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >> >>