FYI: Filed https://issues.apache.org/jira/browse/SPARK-24466 and provided
the patch https://github.com/apache/spark/pull/21497

2018년 6월 5일 (화) 오전 11:30, Jungtaek Lim <kabh...@gmail.com>님이 작성:

> Yeah that's why I initiated this thread, especially socket source is
> expected to be used from examples on official document or some experiments,
> which we tend to simply use netcat.
>
> I'll file an issue and provide the fix.
>
> 2018년 6월 5일 (화) 오전 1:48, Joseph Torres <joseph.tor...@databricks.com>님이
> 작성:
>
>> I tend to agree that this is a bug. It's kinda silly that nc does this,
>> but a socket connector that doesn't work with netcat will surely seem
>> broken to users. It wouldn't be a huge change to defer opening the socket
>> until a read is actually required.
>>
>> On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim <kabh...@gmail.com> wrote:
>>
>>> Hi devs,
>>>
>>> Not sure I can hear back the response sooner since Spark summit is just
>>> around the corner, but just would want to post and wait.
>>>
>>> While playing with Spark 2.4.0-SNAPSHOT, I found nc command exits before
>>> reading actual data so the query also exits with error.
>>>
>>> The reason is due to launching temporary reader for reading schema, and
>>> closing reader, and re-opening reader. While reliable socket server should
>>> be able to handle this without any issue, nc command normally can't handle
>>> multiple connections and simply exits when closing temporary reader.
>>>
>>> I would like to file an issue and contribute on fixing this if we think
>>> this is a bug (otherwise we need to replace nc utility with another one,
>>> maybe our own implementation?), but not sure we are happy to apply
>>> workaround for specific source.
>>>
>>> Would like to hear opinions before giving a shot.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>
>>

Reply via email to