Re: Writing all values for same key to one file

2016-08-09 Thread neil90
Why not just create a partitions for they key you want to groupby and save it
in there? Appending to a file already written to HDFS isn't the best idea
IMO.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27501.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Writing all values for same key to one file

2016-08-05 Thread colzer
In my opinion,"Append to a file" maybe is not good idea. 
By using `MultipleTextOutputFormat`, you can append all values for a given
key  to a directory

for example:

   class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any,
Any] {
  override def generateFileNameForKeyValue(key: Any, value: Any, name:
String): String =
 key.asInstanceOf[String] + "/" + System.currentTimeMillis() //may
by you can use stream time
  override def generateActualKey(key: Any, value: Any):Any ={
return null
  }
}

val sc = new SparkContext(new
SparkConf().set("spark.hadoop.validateOutputSpecs", "false"))
sc.parallelize(Array("1","2","3"),3)
  .map(a=>(a,a))
  .saveAsHadoopFile("/Users/tmp", classOf[String], classOf[String],
classOf[RDDMultipleTextOutputFormat])






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27486.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Writing all values for same key to one file

2016-08-04 Thread rtijoriwala
Hi Colzer,
Thanks for the response. My main question was about writing one file per
"key" i.e. have a file with all values for a given key. So in the pseudo
code that I have above, am I opening/creating the file in the right place?.
Once the file is created and closed, I cannot append to it.

Thanks,
Ritesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27485.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Writing all values for same key to one file

2016-08-04 Thread colzer
for rdd, you can use `saveAsHadoopFile` with a Custom `MultipleOutputFormat`



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27483.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Writing all values for same key to one file

2016-08-04 Thread ayan guha
Partition your data using the key

rdd.partitionByKey()

On Fri, Aug 5, 2016 at 10:10 AM, rtijoriwala <tijoriwala.rit...@gmail.com>
wrote:

> Any recommendations? comments?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Writing-all-values-for-same-key-to-
> one-file-tp27455p27480.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha


Re: Writing all values for same key to one file

2016-08-04 Thread rtijoriwala
Any recommendations? comments?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27480.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org