Re: Codegen In Shuffle

2015-11-04 Thread
I see. Thanks very much. 2015-11-04 16:25 GMT+08:00 Reynold Xin <r...@databricks.com>: > GenerateUnsafeProjection -- projects any internal row data structure > directly into bytes (UnsafeRow). > > > On Wed, Nov 4, 2015 at 12:21 AM, 牛兆捷 <nzjem...@gmail.com> wrote: &g

Codegen In Shuffle

2015-11-04 Thread
Dear all: Tungsten project has mentioned that they are applying code generation is to speed up the conversion of data from in-memory binary format to wire-protocol for shuffle. Where can I find the related implementation in spark code-based ? -- *Regards,* *Zhaojie*

RDD checkpoint

2015-07-13 Thread
The checkpointed RDD computed twice, why not do the checkpoint for the RDD once it is computed? Is there any special reason for this? -- *Regards,* *Zhaojie*

Questions about Fault tolerance of Spark

2015-07-09 Thread
Hi All: We already know that Spark utilizes the lineage to recompute the RDDs when failure occurs. I want to study the performance of this fault-tolerant approach and have some questions about it. 1) Is there any benchmark (or standard failure model) to test the fault tolerance of these kinds of

Re: How to benchmark SPARK apps?

2014-10-09 Thread
*You can try https://github.com/databricks/spark-perf https://github.com/databricks/spark-perf*

Re: Could Spark make use of Intel Xeon Phi?

2014-10-03 Thread
What are the specific features of intel Xeon Phi that can be utilized by Spark? 2014-10-03 18:09 GMT+08:00 余 浪 yulan...@gmail.com: Hi, I have set up Spark 1.0.2 on the cluster using standalone mode and the input is managed by HDFS. One node of the cluster has Intel Xeon Phi 5110P

Workload for spark testing

2014-09-13 Thread
Hi All: We know some memory of spark are used for computing (e.g., spark.shuffle.memoryFraction) and some are used for caching RDD for future use (e.g., spark.storage.memoryFraction). Is there any existing workload which can utilize both of them during the running left cycle? I want to do some

workload for spark

2014-09-12 Thread
We know some memory of spark are used for computing (e.g., shuffle buffer) and some are used for caching RDD for future use. Is there any existing workload which utilize both of them? I want to do some performance study by adjusting the ratio between them.

Re: memory size for caching RDD

2014-09-04 Thread
at 8:13 PM, 牛兆捷 nzjem...@gmail.com wrote: Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- *Regards

Re: memory size for caching RDD

2014-09-04 Thread
Thanks raymond. I duplicated the question. Please see the reply here. [?] 2014-09-04 14:27 GMT+08:00 牛兆捷 nzjem...@gmail.com: But is it possible to make t resizable? When we don't have many RDD to cache, we can give some memory to others. 2014-09-04 13:45 GMT+08:00 Patrick Wendell pwend

Re: memory size for caching RDD

2014-09-04 Thread
. spark.shuffle.memoryFraction which you also set the up limit. Best Regards, *Raymond Liu* *From:* 牛兆捷 [mailto:nzjem...@gmail.com] *Sent:* Thursday, September 04, 2014 2:27 PM *To:* Patrick Wendell *Cc:* user@spark.apache.org; d...@spark.apache.org *Subject:* Re: memory size for caching RDD

Re: memory size for caching RDD

2014-09-04 Thread
is that this is done by RDD unit, not by block unit. And then, if the storage level including disk level, the data on the disk will be removed too. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:57 PM To: Liu, Raymond Cc: Patrick Wendell; user

resize memory size for caching RDD

2014-09-03 Thread
Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- *Regards,* *Zhaojie*