Hi Patrick,
I have create the jira
https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the
situation is related to join two large rdd, not related to the combine
process as previous thought.
Best Regards,
Jiacheng Guo
On Mon, Jan 27, 2014 at 11:07 AM, guojc guoj...@gmail.com
Hi,
I'm writing a paralell mcmc program that having a very large dataset in
memory, and need to update the dataset in-memory and avoid creating
additional copy. Should I choose a foreach operation on rdd to express the
change? or I have to create a new rdd after each sampling process?
Thanks,
to join
together, you'd better keep them in workers.
2014/1/24 guojc guoj...@gmail.com
Hi,
I'm writing a paralell mcmc program that having a very large dataset
in memory, and need to update the dataset in-memory and avoid creating
additional copy. Should I choose a foreach operation on rdd
Hi,
I'm tring out lastest master branch of spark for the exciting external
hashmap feature. I have a code that is running correctly at spark 0.8.1 and
I only make a change for its easily to be spilled to disk. However, I
encounter a few task failure of
java.util.NoSuchElementException
when you export SPARK_JAR and specify the --jar option.
I'll try to reproduce the error tomorrow to see if a bug was introduced
when I added the feature to run spark from HDFS.
Tom
On Monday, November 18, 2013 11:13 AM, guojc guoj...@gmail.com wrote:
Hi Tom,
I'm on Hadoop 2.05. I can
, guojc guoj...@gmail.com wrote:
Hi Tom,
Thank you for your response. I have double checked that I had upload
both jar in the same folder on hdfs. I think the namefs.default.name/name
you pointed out is the old deprecated name for fs.defaultFS config
accordiing
http://hadoop.apache.org/docs
variable.
You should only have to set SPARK_JAR env variable.
If that isn't the issue let me know the build command you used and hadoop
version, and your defaultFs or hadoop.
Tom
On Saturday, November 16, 2013 2:32 AM, guojc guoj...@gmail.com wrote:
hi,
After reading about
hi,
After reading about the exiting progress in consolidating shuffle, I'm
eager to trying out the last master branch. However up to launch the
example application, the job failed with prompt the app master failed to
find the target jar. appDiagnostics: Application
with using the Shark layer above Spark (and I think
for many use cases the answer would be yes), then you can take advantage
of Shark's co-partitioning. Or do something like
https://github.com/amplab/shark/pull/100/commits
Sent while mobile. Pls excuse typos etc.
On Nov 16, 2013 2:48 AM, guojc
Hi,
Hi,
I'm wondering whether spark rdd can has a partitionedByKey function? The
use of this function is to have a rdd distributed by according to a
cerntain paritioner and cache it. And then further join performance by rdd
with same partitoner will a great speed up. Currently, we only have a
if the
default partitioner does not suit your purpose.
You can take a look at this
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf
.
Thanks,
Meisam
On Fri, Nov 15, 2013 at 6:54 AM, guojc guoj...@gmail.com wrote:
Hi,
I'm wondering whether
Hi,
How can I override the default java.io.tmpdir and spark.local.dir in
YARN. I had tried to set SPARK_YARN_USER_ENV with SPARK_JAVA_OPTS. It seems
has no effect. The position is still from
YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR, and it is a very small disk
for me. Any suggestion?
13 matches
Mail list logo