Thanks Akhil for the suggestion, it is now only giving me one part - xxxx. Is there anyway I can just create a file rather than a directory? It doesn't seem like there is just a saveAsTextFile option for JavaPairRecieverDstream.
Also, for the copy/merge api, how would I add that to my spark job? Thanks Akhil! Best, Su On Thu, Feb 12, 2015 at 11:51 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > For streaming application, for every batch it will create a new directory > and puts the data in it. If you don't want to have multiple files inside > the directory as part-xxxx then you can do a repartition before the saveAs* > call. > > messages.repartition(1).saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class, > String.class, (Class) TextOutputFormat.class); > > > Thanks > Best Regards > > On Fri, Feb 13, 2015 at 11:59 AM, Su She <suhsheka...@gmail.com> wrote: > >> Hello Everyone, >> >> I am writing simple word counts to hdfs using >> messages.saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class, >> String.class, (Class) TextOutputFormat.class); >> >> 1) However, each 2 seconds I getting a new *directory *that is titled as >> a csv. So i'll have test.csv, which will be a directory that has two files >> inside of it called part-00000 and part 00001 (something like that). This >> obv makes it very hard for me to read the data stored in the csv files. I >> am wondering if there is a better way to store the JavaPairRecieverDStream >> and JavaPairDStream? >> >> 2) I know there is a copy/merge hadoop api for merging files...can this >> be done inside java? I am not sure the logic behind this api if I am using >> spark streaming which is constantly making new files. >> >> Thanks a lot for the help! >> > >