[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2021-12-28 Thread Guillaume Desforges (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466171#comment-17466171
 ] 

Guillaume Desforges commented on SPARK-2421:


It seems that the issue was indeed resolved with the introduction of 
SerializableWritable

https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/SerializableWritable.html

> Spark should treat writable as serializable for keys
> 
>
> Key: SPARK-2421
> URL: https://issues.apache.org/jira/browse/SPARK-2421
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output, Java API
>Affects Versions: 1.0.0
>Reporter: Xuefu Zhang
>Priority: Major
>
> It seems that Spark requires the key be serializable (class implement 
> Serializable interface). In Hadoop world, Writable interface is used for the 
> same purpose. A lot of existing classes, while writable, are not considered 
> by Spark as Serializable. It would be nice if Spark can treate Writable as 
> serializable and automatically serialize and de-serialize these classes using 
> writable interface.
> This is identified in HIVE-7279, but its benefits are seen global.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2016-02-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148776#comment-15148776
 ] 

Sean Owen commented on SPARK-2421:
--

Pretty much the same reason in all cases: no activity in 16 months, a 
nice-to-have (i.e. not a known bug), nobody asking for it or putting any work 
into or reason to expect activity. It's more fruitful to reflect reality -- and 
then if desired ask, why is nobody (like yourself) working on it if you are 
interested?

Or: reopen it but please, only if there is a good-faith reason to expect 
someone will work on it imminently. Remember, things can be reopened later. 
Worst case, new issues can be opened. We can also make a different resolution 
like "Later" if people find that softer.

> Spark should treat writable as serializable for keys
> 
>
> Key: SPARK-2421
> URL: https://issues.apache.org/jira/browse/SPARK-2421
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output, Java API
>Affects Versions: 1.0.0
>Reporter: Xuefu Zhang
>
> It seems that Spark requires the key be serializable (class implement 
> Serializable interface). In Hadoop world, Writable interface is used for the 
> same purpose. A lot of existing classes, while writable, are not considered 
> by Spark as Serializable. It would be nice if Spark can treate Writable as 
> serializable and automatically serialize and de-serialize these classes using 
> writable interface.
> This is identified in HIVE-7279, but its benefits are seen global.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2016-02-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148684#comment-15148684
 ] 

Xuefu Zhang commented on SPARK-2421:


[~sowen], I saw you had closed this without giving any explanation. Do you mind 
sharing?

> Spark should treat writable as serializable for keys
> 
>
> Key: SPARK-2421
> URL: https://issues.apache.org/jira/browse/SPARK-2421
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output, Java API
>Affects Versions: 1.0.0
>Reporter: Xuefu Zhang
>
> It seems that Spark requires the key be serializable (class implement 
> Serializable interface). In Hadoop world, Writable interface is used for the 
> same purpose. A lot of existing classes, while writable, are not considered 
> by Spark as Serializable. It would be nice if Spark can treate Writable as 
> serializable and automatically serialize and de-serialize these classes using 
> writable interface.
> This is identified in HIVE-7279, but its benefits are seen global.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2014-10-03 Thread Brian Husted (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157936#comment-14157936
 ] 

Brian Husted commented on SPARK-2421:
-

To work around the problem, one must map the Writable to a String 
(org.apache.hadoop.io.Text in the case below).  This an issue when sorting 
large amounts of data since Spark will attempt to write out the entire dataset 
(spill) to perform the data conversion.  On a 500GB file this fills up more 
than 100GB of space on each node in our 12 node cluster which is very 
inefficient.  We are currently using Spark 1.0.2.  Any thoughts here are 
appreciated.

Our code that attempts to mimic map/reduce sort in Spark:

//read in the hadoop sequence file to sort
 val file = sc.sequenceFile(input, classOf[Text], classOf[Text])

//this is the code we would like to avoid that maps the Hadoop Text Input to 
Strings so the sortyByKey will run
 file.map{ case (k,v) = (k.toString(), v.toString())}

//perform the sort on the converted data
val sortedOutput = file.sortByKey(true, 1)

//write out the results as a sequence file
sortedOutput.saveAsSequenceFile(output, Some(classOf[DefaultCodec])) 

 Spark should treat writable as serializable for keys
 

 Key: SPARK-2421
 URL: https://issues.apache.org/jira/browse/SPARK-2421
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, Java API
Affects Versions: 1.0.0
Reporter: Xuefu Zhang

 It seems that Spark requires the key be serializable (class implement 
 Serializable interface). In Hadoop world, Writable interface is used for the 
 same purpose. A lot of existing classes, while writable, are not considered 
 by Spark as Serializable. It would be nice if Spark can treate Writable as 
 serializable and automatically serialize and de-serialize these classes using 
 writable interface.
 This is identified in HIVE-7279, but its benefits are seen global.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2014-07-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069880#comment-14069880
 ] 

Sandy Ryza commented on SPARK-2421:
---

It should be relatively straightforward to add a WritableSerializer.  One issue 
is that Spark doesn't pass the types in the conf in the way MR does, so on the 
read side we need a way to know what kind of objects to instantiate.  I'm 
messing around with a prototype that just writes out the class name as Text at 
the beginning of the stream.

 Spark should treat writable as serializable for keys
 

 Key: SPARK-2421
 URL: https://issues.apache.org/jira/browse/SPARK-2421
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, Java API
Affects Versions: 1.0.0
Reporter: Xuefu Zhang

 It seems that Spark requires the key be serializable (class implement 
 Serializable interface). In Hadoop world, Writable interface is used for the 
 same purpose. A lot of existing classes, while writable, are not considered 
 by Spark as Serializable. It would be nice if Spark can treate Writable as 
 serializable and automatically serialize and de-serialize these classes using 
 writable interface.
 This is identified in HIVE-7279, but its benefits are seen global.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2014-07-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056677#comment-14056677
 ] 

Xuefu Zhang commented on SPARK-2421:


CC [~rxin], [~hshreedharan]

 Spark should treat writable as serializable for keys
 

 Key: SPARK-2421
 URL: https://issues.apache.org/jira/browse/SPARK-2421
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, Java API
Affects Versions: 1.0.0
Reporter: Xuefu Zhang

 It seems that Spark requires the key be serializable (class implement 
 Serializable interface). In Hadoop world, Writable interface is used for the 
 same purpose. A lot of existing classes, while writable, are not considered 
 by Spark as Serializable. It would be nice if Spark can treate Writable as 
 serializable and automatically serialize and de-serialize these classes using 
 writable interface.
 This is identified in HIVE-7279, but its benefits are seen global.



--
This message was sent by Atlassian JIRA
(v6.2#6252)