Re: Creating RDD from Iterable from groupByKey results

2015-06-16 Thread nir
I updated code sample so people can understand better what are my inputs and outputs. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Creating-RDD-from-Iterable-from-groupByKey-results-tp23328p23341.html Sent from the Apache Spark User List mailing list

Re: RDD of Iterable[String]

2015-06-15 Thread nir
Have you found answer to this? I am also looking for exact same solution. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-of-Iterable-String-tp15016p23329.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: trying to understand yarn-client mode

2015-08-05 Thread nir
Hi DB Tsai-2, I am trying to run singleton sparkcontext in my container (spring-boot tomcat container). When my application bootstrap I used to create sparkContext and keep the reference for future job submission. I got it working with standalone spark perfectly but I am having trouble with yarn

Re: Does filter on an RDD scan every data item ?

2016-01-23 Thread nir
"I don't think you could avoid this in general, right, in any system? " Really? nosql databases do efficient lookups(and scan) based on key and partition. look at cassandra, hbase -- View this message in context:

Re: Does filter on an RDD scan every data item ?

2016-01-23 Thread nir
Looks like this has been supported from 1.4 release :) https://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.rdd.OrderedRDDFunctions -- View this message in context:

Re: Off-heap memory usage of Spark Executors keeps increasing

2016-01-26 Thread nir
Are you having this issue with spark 1.5 as well? We had similar OOM issue and was told by databricks to upgrade to 1.5 to resolve that. I guess they are trying to sell Tachyon :) -- View this message in context:

Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2016-03-14 Thread nir
For uniform partitioning, you can try custom Partitioner. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Requested-array-size-exceeds-VM-limit-tp16809p26477.html Sent from the Apache Spark User List mailing list archive at