[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466171#comment-17466171 ] Guillaume Desforges commented on SPARK-2421: It seems that the issue was indeed resolved with the introduction of SerializableWritable https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/SerializableWritable.html > Spark should treat writable as serializable for keys > > > Key: SPARK-2421 > URL: https://issues.apache.org/jira/browse/SPARK-2421 > Project: Spark > Issue Type: Improvement > Components: Input/Output, Java API >Affects Versions: 1.0.0 >Reporter: Xuefu Zhang >Priority: Major > > It seems that Spark requires the key be serializable (class implement > Serializable interface). In Hadoop world, Writable interface is used for the > same purpose. A lot of existing classes, while writable, are not considered > by Spark as Serializable. It would be nice if Spark can treate Writable as > serializable and automatically serialize and de-serialize these classes using > writable interface. > This is identified in HIVE-7279, but its benefits are seen global. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148776#comment-15148776 ] Sean Owen commented on SPARK-2421: -- Pretty much the same reason in all cases: no activity in 16 months, a nice-to-have (i.e. not a known bug), nobody asking for it or putting any work into or reason to expect activity. It's more fruitful to reflect reality -- and then if desired ask, why is nobody (like yourself) working on it if you are interested? Or: reopen it but please, only if there is a good-faith reason to expect someone will work on it imminently. Remember, things can be reopened later. Worst case, new issues can be opened. We can also make a different resolution like "Later" if people find that softer. > Spark should treat writable as serializable for keys > > > Key: SPARK-2421 > URL: https://issues.apache.org/jira/browse/SPARK-2421 > Project: Spark > Issue Type: Improvement > Components: Input/Output, Java API >Affects Versions: 1.0.0 >Reporter: Xuefu Zhang > > It seems that Spark requires the key be serializable (class implement > Serializable interface). In Hadoop world, Writable interface is used for the > same purpose. A lot of existing classes, while writable, are not considered > by Spark as Serializable. It would be nice if Spark can treate Writable as > serializable and automatically serialize and de-serialize these classes using > writable interface. > This is identified in HIVE-7279, but its benefits are seen global. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148684#comment-15148684 ] Xuefu Zhang commented on SPARK-2421: [~sowen], I saw you had closed this without giving any explanation. Do you mind sharing? > Spark should treat writable as serializable for keys > > > Key: SPARK-2421 > URL: https://issues.apache.org/jira/browse/SPARK-2421 > Project: Spark > Issue Type: Improvement > Components: Input/Output, Java API >Affects Versions: 1.0.0 >Reporter: Xuefu Zhang > > It seems that Spark requires the key be serializable (class implement > Serializable interface). In Hadoop world, Writable interface is used for the > same purpose. A lot of existing classes, while writable, are not considered > by Spark as Serializable. It would be nice if Spark can treate Writable as > serializable and automatically serialize and de-serialize these classes using > writable interface. > This is identified in HIVE-7279, but its benefits are seen global. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157936#comment-14157936 ] Brian Husted commented on SPARK-2421: - To work around the problem, one must map the Writable to a String (org.apache.hadoop.io.Text in the case below). This an issue when sorting large amounts of data since Spark will attempt to write out the entire dataset (spill) to perform the data conversion. On a 500GB file this fills up more than 100GB of space on each node in our 12 node cluster which is very inefficient. We are currently using Spark 1.0.2. Any thoughts here are appreciated. Our code that attempts to mimic map/reduce sort in Spark: //read in the hadoop sequence file to sort val file = sc.sequenceFile(input, classOf[Text], classOf[Text]) //this is the code we would like to avoid that maps the Hadoop Text Input to Strings so the sortyByKey will run file.map{ case (k,v) = (k.toString(), v.toString())} //perform the sort on the converted data val sortedOutput = file.sortByKey(true, 1) //write out the results as a sequence file sortedOutput.saveAsSequenceFile(output, Some(classOf[DefaultCodec])) Spark should treat writable as serializable for keys Key: SPARK-2421 URL: https://issues.apache.org/jira/browse/SPARK-2421 Project: Spark Issue Type: Improvement Components: Input/Output, Java API Affects Versions: 1.0.0 Reporter: Xuefu Zhang It seems that Spark requires the key be serializable (class implement Serializable interface). In Hadoop world, Writable interface is used for the same purpose. A lot of existing classes, while writable, are not considered by Spark as Serializable. It would be nice if Spark can treate Writable as serializable and automatically serialize and de-serialize these classes using writable interface. This is identified in HIVE-7279, but its benefits are seen global. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069880#comment-14069880 ] Sandy Ryza commented on SPARK-2421: --- It should be relatively straightforward to add a WritableSerializer. One issue is that Spark doesn't pass the types in the conf in the way MR does, so on the read side we need a way to know what kind of objects to instantiate. I'm messing around with a prototype that just writes out the class name as Text at the beginning of the stream. Spark should treat writable as serializable for keys Key: SPARK-2421 URL: https://issues.apache.org/jira/browse/SPARK-2421 Project: Spark Issue Type: Improvement Components: Input/Output, Java API Affects Versions: 1.0.0 Reporter: Xuefu Zhang It seems that Spark requires the key be serializable (class implement Serializable interface). In Hadoop world, Writable interface is used for the same purpose. A lot of existing classes, while writable, are not considered by Spark as Serializable. It would be nice if Spark can treate Writable as serializable and automatically serialize and de-serialize these classes using writable interface. This is identified in HIVE-7279, but its benefits are seen global. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056677#comment-14056677 ] Xuefu Zhang commented on SPARK-2421: CC [~rxin], [~hshreedharan] Spark should treat writable as serializable for keys Key: SPARK-2421 URL: https://issues.apache.org/jira/browse/SPARK-2421 Project: Spark Issue Type: Improvement Components: Input/Output, Java API Affects Versions: 1.0.0 Reporter: Xuefu Zhang It seems that Spark requires the key be serializable (class implement Serializable interface). In Hadoop world, Writable interface is used for the same purpose. A lot of existing classes, while writable, are not considered by Spark as Serializable. It would be nice if Spark can treate Writable as serializable and automatically serialize and de-serialize these classes using writable interface. This is identified in HIVE-7279, but its benefits are seen global. -- This message was sent by Atlassian JIRA (v6.2#6252)