Re: Request for Help
Hi Not sure this is the right way of doing it, but if you can create a PairRDDFunction from that RDD then you can use the following piece of code to access the filenames from the RDD. PairRDDFunctionsK, V ds = .; //getting the name and path for the file name for(int i=0;ids.values().getPartitions().length;i++) { UnionPartition upp = (UnionPartition) ds.values().getPartitions()[i]; NewHadoopPartition npp = (NewHadoopPartition) upp.split(); System.out.println(File + npp.serializableHadoopSplit().value().toString()); } Thanks Best Regards On Tue, Aug 26, 2014 at 1:25 AM, yh18190 yh18...@gmail.com wrote: Hi Guys, I just want to know whether their is any way to determine which file is being handled by spark from a group of files input inside a directory.Suppose I have 1000 files which are given as input,I want to determine which file is being handled currently by spark program so that if any error creeps in at any point of time we can easily determine that particular file as faulty one. Please let me know your thoughts. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Request-for-Help-tp12776.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Request for help in writing to Textfile
Hi Guys, I am currently playing with huge data.I have an RDD which returns RDD[List[(tuples)]].I need only the tuples to be written to textfile output using saveAsTextFile function. example:val mod=modify.saveASTextFile() returns List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1), (20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1)) List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1), (20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1) I need following output with only tuple values in a textfile. 20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1 20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1 Please let me know if anybody has anyidea regarding this without using collect() function...Please help me -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Request-for-help-in-writing-to-Textfile-tp12744.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Request for Help
Hi Guys, I just want to know whether their is any way to determine which file is being handled by spark from a group of files input inside a directory.Suppose I have 1000 files which are given as input,I want to determine which file is being handled currently by spark program so that if any error creeps in at any point of time we can easily determine that particular file as faulty one. Please let me know your thoughts. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Request-for-Help-tp12776.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Request for help in writing to Textfile
You can try to manipulate the string you want to output before saveAsTextFile, something like modify. flatMap(x=x).map{x= val s=x.toString s.subSequence(1,s.length-1) } Should have more optimized way. Best Regards, Raymond Liu -Original Message- From: yh18190 [mailto:yh18...@gmail.com] Sent: Monday, August 25, 2014 9:57 PM To: u...@spark.incubator.apache.org Subject: Request for help in writing to Textfile Hi Guys, I am currently playing with huge data.I have an RDD which returns RDD[List[(tuples)]].I need only the tuples to be written to textfile output using saveAsTextFile function. example:val mod=modify.saveASTextFile() returns List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1), (20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1)) List((20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1), (20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1) I need following output with only tuple values in a textfile. 20140813,4,141127,3,HYPHLJLU,HY,KNGHWEB,USD,144.00,662.40,KY1 20140813,4,141127,3,HYPHLJLU,HY,DBLHWEB,USD,144.00,662.40,KY1 Please let me know if anybody has anyidea regarding this without using collect() function...Please help me -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Request-for-help-in-writing-to-Textfile-tp12744.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org