Re: Information on Spark UI

2014-06-11 Thread Daniel Darabos
About more succeeded tasks than total tasks:
 - This can happen if you have enabled speculative execution. Some
partitions can get processed multiple times.
 - More commonly, the result of the stage may be used in a later
calculation, and has to be recalculated. This happens if some of the
results were evicted from cache.


On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang shuoxiang...@gmail.com wrote:

 Hi,
   Came up with some confusion regarding the information on SparkUI. The
 following info is gathered while factorizing a large matrix using ALS:
   1. some stages have more succeeded tasks than total tasks, which are
 displayed in the 5th column.
   2. duplicate stages with exactly same stageID (stage 1/3/7)
   3. Clicking into some stages, some executors cannot be addressed. Does
 that mean lost of executor or this does not matter?

   Any explanation are appreciated!





Re: Information on Spark UI

2014-06-11 Thread Shuo Xiang
Daniel,
  Thanks for the explanation.


On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos 
daniel.dara...@lynxanalytics.com wrote:

 About more succeeded tasks than total tasks:
  - This can happen if you have enabled speculative execution. Some
 partitions can get processed multiple times.
  - More commonly, the result of the stage may be used in a later
 calculation, and has to be recalculated. This happens if some of the
 results were evicted from cache.


 On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang shuoxiang...@gmail.com
 wrote:

 Hi,
   Came up with some confusion regarding the information on SparkUI. The
 following info is gathered while factorizing a large matrix using ALS:
   1. some stages have more succeeded tasks than total tasks, which are
 displayed in the 5th column.
   2. duplicate stages with exactly same stageID (stage 1/3/7)
   3. Clicking into some stages, some executors cannot be addressed. Does
 that mean lost of executor or this does not matter?

   Any explanation are appreciated!






Re: Information on Spark UI

2014-06-11 Thread Shuo Xiang
Using MEMORY_AND_DISK_SER to persist the input RDD[Rating] seems to work
right for me now. I'm testing on a larger dataset and will see how it goes.


On Wed, Jun 11, 2014 at 9:56 AM, Neville Li neville@gmail.com wrote:

 Does cache eviction affect disk storage level too? I tried cranking up
 replication but still seeing this.


 On Wednesday, June 11, 2014, Shuo Xiang shuoxiang...@gmail.com wrote:

 Daniel,
   Thanks for the explanation.


 On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos 
 daniel.dara...@lynxanalytics.com wrote:

 About more succeeded tasks than total tasks:
  - This can happen if you have enabled speculative execution. Some
 partitions can get processed multiple times.
  - More commonly, the result of the stage may be used in a later
 calculation, and has to be recalculated. This happens if some of the
 results were evicted from cache.


 On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang shuoxiang...@gmail.com
 wrote:

 Hi,
   Came up with some confusion regarding the information on SparkUI. The
 following info is gathered while factorizing a large matrix using ALS:
   1. some stages have more succeeded tasks than total tasks, which are
 displayed in the 5th column.
   2. duplicate stages with exactly same stageID (stage 1/3/7)
   3. Clicking into some stages, some executors cannot be addressed.
 Does that mean lost of executor or this does not matter?

   Any explanation are appreciated!







Re: Information on Spark UI

2014-06-10 Thread coderxiang
The executors shown CANNOT FIND ADDRESS are not listed in the Executors Tab
on the top of the Spark UI.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Information-on-Spark-UI-tp7354p7355.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Information on Spark UI

2014-06-10 Thread Neville Li
We are seeing this issue as well.
We run on YARN and see logs about lost executor. Looks like some stages had
to be re-run to compute RDD partitions lost in the executor.

We were able to complete 20 iterations with 20% full matrix but not beyond
that (total  100GB).


On Tue, Jun 10, 2014 at 8:32 PM, coderxiang shuoxiang...@gmail.com wrote:

 The executors shown CANNOT FIND ADDRESS are not listed in the Executors
 Tab
 on the top of the Spark UI.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Information-on-Spark-UI-tp7354p7355.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.