Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50101155 The issue is what other things we can reasonably serialize into 8 bytes. Not sure how other types of doubles are relevant here since the size would be different and cause problems right away. Longs are also 8 bytes, so would some scheme of serializing an array of 2 shorts/chars. It's a tradeoff between efficiency and safety. We can remove the magic byte assuming no one's ever going to serialize an RDD of Doubles into 8-byte arrays and then use a Long deser on the 8-byte array. A compromise would be embedding the type metadata in the RDD of bytearray itself so we don't incur the cost of per point blow-up.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---