[jira] [Resolved] (SPARK-17633) texFile() and wholeTextFiles() count difference

Sean Owen (JIRA) Fri, 30 Sep 2016 04:08:44 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-17633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-17633.
-------------------------------

I think Spark depends on the semantics of the Hadoop APIs to read data, and 
they're not assuming the files can change. At best this is an issue with the 
Hadoop implementation.

> texFile() and wholeTextFiles() count difference
> -----------------------------------------------
>
>                 Key: SPARK-17633
>                 URL: https://issues.apache.org/jira/browse/SPARK-17633
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.6.2
>         Environment: Unix/Linux
>            Reporter: Anshul
>
> sc.textFile() creates an RDD of string from a text file.
> After that when count is performed, the line count is correct, but if more 
> than one line is appended to the file manually and counting the same RDD of 
> string increments the output/result only by 1. 
> But in case of sc.wholeTextFiles() the output/result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17633) texFile() and wholeTextFiles() count difference

Reply via email to