Updating exising JSON files

2014-08-15 Thread ejb11235
I have a bunch of JSON files stored in HDFS that I want to read in, modify,
and write back out. I'm new to all this and am not sure if this is even the
right thing to do.

Basically, my JSON files contain my raw data, and I want to calculate some
derived data and add is to the existing data.

So first, is my basic approach to the problem flawed? Should I be placing
derived data somewhere else?

If not, how to I modify the existing JSON files?

Note: I have been able to read the JSON files into an RDD using
sqlContext.jsonFile, and save them back using RDD.saveAsTextFile(). But this
creates new files. Is there a way to over write the original files?

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Updating-exising-JSON-files-tp12211.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Updating exising JSON files

2014-08-16 Thread Sean Owen
If you mean you want to overwrite the file in-place while you're
reading it, no you can't do that with HDFS. That would be dicey on any
file system. If you just want to append to the file, yes HDFS supports
appends. I am pretty certain Spark does not have a concept that maps
to appending, though I suppose you can put just about anything you
like in a function, including manually reading, computing and
appending to an HDFS file.

I think it will be far easier to write different output files and then
after overwrite the originals with them.

On Sat, Aug 16, 2014 at 12:53 AM, ejb11235  wrote:
> I have a bunch of JSON files stored in HDFS that I want to read in, modify,
> and write back out. I'm new to all this and am not sure if this is even the
> right thing to do.
>
> Basically, my JSON files contain my raw data, and I want to calculate some
> derived data and add is to the existing data.
>
> So first, is my basic approach to the problem flawed? Should I be placing
> derived data somewhere else?
>
> If not, how to I modify the existing JSON files?
>
> Note: I have been able to read the JSON files into an RDD using
> sqlContext.jsonFile, and save them back using RDD.saveAsTextFile(). But this
> creates new files. Is there a way to over write the original files?
>
> Thanks!
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Updating-exising-JSON-files-tp12211.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org