Hi Team, Is this issue with JavaStreamingContext.textFileStream("hdfsfolderpath") API also? Please conform. If yes, could you please help me to fix this issue. I'm using spark 1.0.0 version.
Regards, Rajesh On Tue, Jul 15, 2014 at 5:42 AM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Oh yes, this was a bug and it has been fixed. Checkout from the master > branch! > > > https://issues.apache.org/jira/browse/SPARK-2362?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Streaming%20ORDER%20BY%20created%20DESC%2C%20priority%20ASC > > TD > > > On Mon, Jul 7, 2014 at 7:11 AM, Luis Ángel Vicente Sánchez < > langel.gro...@gmail.com> wrote: > >> I have a basic spark streaming job that is watching a folder, processing >> any new file and updating a column family in cassandra using the new >> cassandra-spark-driver. >> >> I think there is a problem with SparkStreamingContext.textFileStream... >> if I start my job in local mode with no files in the folder that is watched >> and then I copy a bunch of files, sometimes spark is continually processing >> those files again and again. >> >> I have noticed that it usually happens when spark doesn't detect all new >> files in one go... i.e. I copied 6 files and spark detected 3 of them as >> new and processed them; then it detected the other 3 as new and processed >> them. After it finished to process all 6 files, it detected again the first >> 3 files as new files and processed them... then the other 3... and again... >> and again... and again. >> >> Should I rise a JIRA issue? >> >> Regards, >> >> Luis >> > >