Re: Writing all values for same key to one file
Why not just create a partitions for they key you want to groupby and save it in there? Appending to a file already written to HDFS isn't the best idea IMO. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27501.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Writing all values for same key to one file
In my opinion,"Append to a file" maybe is not good idea. By using `MultipleTextOutputFormat`, you can append all values for a given key to a directory for example: class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any, Any] { override def generateFileNameForKeyValue(key: Any, value: Any, name: String): String = key.asInstanceOf[String] + "/" + System.currentTimeMillis() //may by you can use stream time override def generateActualKey(key: Any, value: Any):Any ={ return null } } val sc = new SparkContext(new SparkConf().set("spark.hadoop.validateOutputSpecs", "false")) sc.parallelize(Array("1","2","3"),3) .map(a=>(a,a)) .saveAsHadoopFile("/Users/tmp", classOf[String], classOf[String], classOf[RDDMultipleTextOutputFormat]) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27486.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Writing all values for same key to one file
Hi Colzer, Thanks for the response. My main question was about writing one file per "key" i.e. have a file with all values for a given key. So in the pseudo code that I have above, am I opening/creating the file in the right place?. Once the file is created and closed, I cannot append to it. Thanks, Ritesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27485.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Writing all values for same key to one file
for rdd, you can use `saveAsHadoopFile` with a Custom `MultipleOutputFormat` -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27483.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Writing all values for same key to one file
Partition your data using the key rdd.partitionByKey() On Fri, Aug 5, 2016 at 10:10 AM, rtijoriwala wrote: > Any recommendations? comments? > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Writing-all-values-for-same-key-to- > one-file-tp27455p27480.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Best Regards, Ayan Guha
Re: Writing all values for same key to one file
Any recommendations? comments? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27480.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org