[jira] [Commented] (SPARK-21182) Structured streaming on Spark-shell on windows

Hyukjin Kwon (JIRA) Tue, 27 Jun 2017 20:38:28 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065876#comment-16065876
 ]


Hyukjin Kwon commented on SPARK-21182:
--------------------------------------

Looks I can't reproduce this on Windows at the current master.

With the reproducer below:

{code:title=Wordcount.scala|borderStyle=solid}
val lines = spark.readStream.format("socket").option("host", 
"localhost").option("port", 9999).load()
val words = lines.as[String].flatMap(_.split(" "))
val wordCounts = words.groupBy("value").count().sort($"count".desc)
val query = 
wordCounts.writeStream.outputMode("complete").format("console").start()
query.awaitTermination()
{code}

{code:title=nc.py|borderStyle=solid}
import socket
import urllib
import time


if __name__ == "__main__":
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind(('0.0.0.0', 9999))
    s.listen(1)
    conn, _ = s.accept()
    while True:
        conn.sendall(raw_input() + "\n")
{code}

In cmd A:

{code}
C:\...\...>python nc.py
{code}

In cmd B:

{code}
C:\...\...>.\bin\spark-shell -i Wordcount.scala
{code}

In cmd A:
{code}
 
a
abab
abab
{code}

In cmd B:

{code}
-------------------------------------------
Batch: 0
-------------------------------------------
...
+-----+-----+
|value|count|
+-----+-----+
|     |    1|
+-----+-----+

-------------------------------------------
Batch: 1
-------------------------------------------
...
+-----+-----+
|value|count|
+-----+-----+
|    a|    1|
|     |    1|
+-----+-----+

-------------------------------------------
Batch: 2
-------------------------------------------
...
+-----+-----+
|value|count|
+-----+-----+
| abab|    1|
|    a|    1|
|     |    1|
+-----+-----+

-------------------------------------------
Batch: 3
-------------------------------------------
...
+-----+-----+
|value|count|
+-----+-----+
| abab|    2|
|    a|    1|
|     |    1|
+-----+-----+

{code}


> Structured streaming on Spark-shell on windows
> ----------------------------------------------
>
>                 Key: SPARK-21182
>                 URL: https://issues.apache.org/jira/browse/SPARK-21182
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.1.1
>         Environment: Windows 10
> spark-2.1.1-bin-hadoop2.7
>            Reporter: Vijay
>            Priority: Minor
>
> Structured streaming output operation is failing on Windows shell.
> As per the error message, path is being prefixed with File separator as in 
> Linux.
> Thus, causing the IllegalArgumentException.
> Following is the error message.
> scala> val query = wordCounts.writeStream  .outputMode("complete")  
> .format("console")  .start()
> java.lang.IllegalArgumentException: Pathname 
> {color:red}*/*{color}C:/Users/Vijay/AppData/Local/Temp/temporary-081b482c-98a4-494e-8cfb-22d966c2da01/offsets
>  from 
> C:/Users/Vijay/AppData/Local/Temp/temporary-081b482c-98a4-494e-8cfb-22d966c2da01/offsets
>  is not a valid DFS filename.
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:222)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:280)
>   at 
> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:268)
>   ... 52 elided



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21182) Structured streaming on Spark-shell on windows

Reply via email to