Re: saveAsObjectFile is actually saveAsSequenceFile

2015-01-14 Thread Sean Owen
Yeah it is actually serializing elements after chunking them into arrays of 10 elements each. It's not actually a key-value pair in the SequenceFile for each element. That is how objectFile() reads it and flatMaps it, and the docs say that the intent is that this is an opaque, not-guaranteed

saveAsObjectFile is actually saveAsSequenceFile

2015-01-13 Thread Kevin Burton
This is interesting. I’m using ObjectInputStream to try to read a file written as saveAsObjectFile… but it’s not working. The documentation says: Write the elements of the dataset in a simple format using Java serialization, which can then be loaded using SparkContext.objectFile().” … but

Re: saveAsObjectFile is actually saveAsSequenceFile

2015-01-13 Thread Sean Owen
Yes, that's even what the objectFile javadoc says. It is expecting a SequenceFile with NullWritable keys and BytesWritable values containing the serialized values. This looks correct to me. On Tue, Jan 13, 2015 at 8:39 AM, Kevin Burton bur...@spinn3r.com wrote: This is interesting. I’m using

Re: saveAsObjectFile is actually saveAsSequenceFile

2015-01-13 Thread Kevin Burton
Yes.. but this isn’t what the main documentation says. The file format isn’t very discoverable.. Also, the documentation doesn’t say anything about the group by 10.. what’s that about? Kevin On Tue, Jan 13, 2015 at 2:28 AM, Sean Owen so...@cloudera.com wrote: Yes, that's even what the