Re: Possible bug in Spark Streaming :: TextFileStream

Madabhattula Rajesh Kumar Mon, 14 Jul 2014 21:10:09 -0700

Hi Team,

Is this issue with JavaStreamingContext.textFileStream("hdfsfolderpath")
API also? Please conform. If yes, could you please help me to fix this
issue. I'm using spark 1.0.0 version.


Regards,
Rajesh


On Tue, Jul 15, 2014 at 5:42 AM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Oh yes, this was a bug and it has been fixed. Checkout from the master
> branch!
>
>
> https://issues.apache.org/jira/browse/SPARK-2362?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Streaming%20ORDER%20BY%20created%20DESC%2C%20priority%20ASC
>
> TD
>
>
> On Mon, Jul 7, 2014 at 7:11 AM, Luis Ángel Vicente Sánchez <
> langel.gro...@gmail.com> wrote:
>
>> I have a basic spark streaming job that is watching a folder, processing
>> any new file and updating a column family in cassandra using the new
>> cassandra-spark-driver.
>>
>> I think there is a problem with SparkStreamingContext.textFileStream...
>> if I start my job in local mode with no files in the folder that is watched
>> and then I copy a bunch of files, sometimes spark is continually processing
>> those files again and again.
>>
>> I have noticed that it usually happens when spark doesn't detect all new
>> files in one go... i.e. I copied 6 files and spark detected 3 of them as
>> new and processed them; then it detected the other 3 as new and processed
>> them. After it finished to process all 6 files, it detected again the first
>> 3 files as new files and processed them... then the other 3... and again...
>> and again... and again.
>>
>> Should I rise a JIRA issue?
>>
>> Regards,
>>
>> Luis
>>
>
>

Re: Possible bug in Spark Streaming :: TextFileStream

Reply via email to