I updated code sample so people can understand better what are my inputs and
outputs.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Creating-RDD-from-Iterable-from-groupByKey-results-tp23328p23341.html
Sent from the Apache Spark User List mailing list
Have you found answer to this? I am also looking for exact same solution.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-of-Iterable-String-tp15016p23329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi DB Tsai-2,
I am trying to run singleton sparkcontext in my container (spring-boot
tomcat container). When my application bootstrap I used to create
sparkContext and keep the reference for future job submission. I got it
working with standalone spark perfectly but I am having trouble with yarn
"I don't think you could avoid this
in general, right, in any system? "
Really? nosql databases do efficient lookups(and scan) based on key and
partition. look at cassandra, hbase
--
View this message in context:
Looks like this has been supported from 1.4 release :)
https://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.rdd.OrderedRDDFunctions
--
View this message in context:
Are you having this issue with spark 1.5 as well? We had similar OOM issue
and was told by databricks to upgrade to 1.5 to resolve that. I guess they
are trying to sell Tachyon :)
--
View this message in context:
For uniform partitioning, you can try custom Partitioner.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Requested-array-size-exceeds-VM-limit-tp16809p26477.html
Sent from the Apache Spark User List mailing list archive at