Re: ExternalAppendOnlyMap throw no such element

2014-01-26 Thread guojc
Hi Patrick, I have create the jira https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the situation is related to join two large rdd, not related to the combine process as previous thought. Best Regards, Jiacheng Guo On Mon, Jan 27, 2014 at 11:07 AM, guojc guoj...@gmail.com

Does foreach operation increase rdd lineage?

2014-01-24 Thread guojc
Hi, I'm writing a paralell mcmc program that having a very large dataset in memory, and need to update the dataset in-memory and avoid creating additional copy. Should I choose a foreach operation on rdd to express the change? or I have to create a new rdd after each sampling process? Thanks,

Re: Does foreach operation increase rdd lineage?

2014-01-24 Thread guojc
to join together, you'd better keep them in workers. 2014/1/24 guojc guoj...@gmail.com Hi, I'm writing a paralell mcmc program that having a very large dataset in memory, and need to update the dataset in-memory and avoid creating additional copy. Should I choose a foreach operation on rdd

ExternalAppendOnlyMap throw no such element

2014-01-20 Thread guojc
Hi, I'm tring out lastest master branch of spark for the exciting external hashmap feature. I have a code that is running correctly at spark 0.8.1 and I only make a change for its easily to be spilled to disk. However, I encounter a few task failure of java.util.NoSuchElementException

Re: App master failed to find application jar in the master branch on YARN

2013-11-19 Thread guojc
when you export SPARK_JAR and specify the --jar option. I'll try to reproduce the error tomorrow to see if a bug was introduced when I added the feature to run spark from HDFS. Tom On Monday, November 18, 2013 11:13 AM, guojc guoj...@gmail.com wrote: Hi Tom, I'm on Hadoop 2.05. I can

Re: App master failed to find application jar in the master branch on YARN

2013-11-19 Thread guojc
, guojc guoj...@gmail.com wrote: Hi Tom, Thank you for your response. I have double checked that I had upload both jar in the same folder on hdfs. I think the namefs.default.name/name you pointed out is the old deprecated name for fs.defaultFS config accordiing http://hadoop.apache.org/docs

Re: App master failed to find application jar in the master branch on YARN

2013-11-18 Thread guojc
variable. You should only have to set SPARK_JAR env variable. If that isn't the issue let me know the build command you used and hadoop version, and your defaultFs or hadoop. Tom On Saturday, November 16, 2013 2:32 AM, guojc guoj...@gmail.com wrote: hi, After reading about

App master failed to find application jar in the master branch on YARN

2013-11-16 Thread guojc
hi, After reading about the exiting progress in consolidating shuffle, I'm eager to trying out the last master branch. However up to launch the example application, the job failed with prompt the app master failed to find the target jar. appDiagnostics: Application

Re: Does spark RDD has a partitionedByKey

2013-11-16 Thread guojc
with using the Shark layer above Spark (and I think for many use cases the answer would be yes), then you can take advantage of Shark's co-partitioning. Or do something like https://github.com/amplab/shark/pull/100/commits Sent while mobile. Pls excuse typos etc. On Nov 16, 2013 2:48 AM, guojc

Does Spark has a partitionByKey function

2013-11-15 Thread guojc
Hi,

Does spark RDD has a partitionedByKey

2013-11-15 Thread guojc
Hi, I'm wondering whether spark rdd can has a partitionedByKey function? The use of this function is to have a rdd distributed by according to a cerntain paritioner and cache it. And then further join performance by rdd with same partitoner will a great speed up. Currently, we only have a

Re: Does spark RDD has a partitionedByKey

2013-11-15 Thread guojc
if the default partitioner does not suit your purpose. You can take a look at this http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf . Thanks, Meisam On Fri, Nov 15, 2013 at 6:54 AM, guojc guoj...@gmail.com wrote: Hi, I'm wondering whether

How to override yarn default java.io.tmpdir and spark.local.dir

2013-11-15 Thread guojc
Hi, How can I override the default java.io.tmpdir and spark.local.dir in YARN. I had tried to set SPARK_YARN_USER_ENV with SPARK_JAVA_OPTS. It seems has no effect. The position is still from YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR, and it is a very small disk for me. Any suggestion?