[jira] [Created] (SPARK-4585) Spark dynamic scaling executors use upper limit value as default.

2014-11-24 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-4585:


 Summary: Spark dynamic scaling executors use upper limit value as 
default.
 Key: SPARK-4585
 URL: https://issues.apache.org/jira/browse/SPARK-4585
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.1.0
Reporter: Chengxiang Li


With SPARK-3174, one can configure a minimum and maximum number of executors 
for a Spark application on Yarn. However, the application always starts with 
the maximum. It seems more reasonable, at least for Hive on Spark, to start 
from the minimum and scale up as needed up to the maximum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4955) Executor does not get killed after configured interval.

2014-12-24 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-4955:


 Summary: Executor does not get killed after configured interval.
 Key: SPARK-4955
 URL: https://issues.apache.org/jira/browse/SPARK-4955
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Chengxiang Li


With executor dynamic scaling enabled, in yarn-cluster mode, after query 
finished and spark.dynamicAllocation.executorIdleTimeout interval, executor 
number is not reduced to configured min number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4955) Executor does not get killed after configured interval.

2014-12-24 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258138#comment-14258138
 ] 

Chengxiang Li commented on SPARK-4955:
--

I verified this feature with Hive on Spark, it works well in yarn-client mode, 
while failed to reduce to min executor number after idle timeout in 
yarn-cluster mode, here is the related appmaster log:
{noformat}
14/12/24 16:12:50 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 2
14/12/24 16:12:50 WARN cluster.YarnClusterSchedulerBackend: Attempted to kill 
executors before the AM has registered!
14/12/24 16:12:50 WARN spark.ExecutorAllocationManager: Unable to reach the 
cluster manager to kill executor 2!
14/12/24 16:12:50 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 1
14/12/24 16:12:50 WARN cluster.YarnClusterSchedulerBackend: Attempted to kill 
executors before the AM has registered!
14/12/24 16:12:50 WARN spark.ExecutorAllocationManager: Unable to reach the 
cluster manager to kill executor 1!
14/12/24 16:12:50 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 4
14/12/24 16:12:50 WARN cluster.YarnClusterSchedulerBackend: Attempted to kill 
executors before the AM has registered!
14/12/24 16:12:50 WARN spark.ExecutorAllocationManager: Unable to reach the 
cluster manager to kill executor 4!
14/12/24 16:12:50 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 3
14/12/24 16:12:50 WARN cluster.YarnClusterSchedulerBackend: Attempted to kill 
executors before the AM has registered!
14/12/24 16:12:50 WARN spark.ExecutorAllocationManager: Unable to reach the 
cluster manager to kill executor 3!
{noformat}

> Executor does not get killed after configured interval.
> ---
>
> Key: SPARK-4955
> URL: https://issues.apache.org/jira/browse/SPARK-4955
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Chengxiang Li
>
> With executor dynamic scaling enabled, in yarn-cluster mode, after query 
> finished and spark.dynamicAllocation.executorIdleTimeout interval, executor 
> number is not reduced to configured min number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4955) Executor does not get killed after configured interval.

2014-12-24 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258139#comment-14258139
 ] 

Chengxiang Li commented on SPARK-4955:
--

cc:[~andrewor14]

> Executor does not get killed after configured interval.
> ---
>
> Key: SPARK-4955
> URL: https://issues.apache.org/jira/browse/SPARK-4955
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Chengxiang Li
>
> With executor dynamic scaling enabled, in yarn-cluster mode, after query 
> finished and spark.dynamicAllocation.executorIdleTimeout interval, executor 
> number is not reduced to configured min number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3199) native Java spark listener API support

2014-08-25 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-3199:


 Summary: native Java spark listener API support
 Key: SPARK-3199
 URL: https://issues.apache.org/jira/browse/SPARK-3199
 Project: Spark
  Issue Type: New Feature
  Components: Java API
Reporter: Chengxiang Li


Current spark listener API is totally scala style, full of case classes and 
scala collections, a native Java spark listener API would be much friendly for 
Java users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-08-25 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-

Summary: enhance spark listener API to gather more spark job information  
(was: support register spark listener to listener bus with Java API)

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-08-25 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-

Description: Based on Hive on Spark job status monitoring and statistic 
collection requirement, try to enhance spark listener API to gather more spark 
job information.  (was: Currently user can only register spark listener with 
Scala API, we should add this feature to Java API as well.)

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-08-25 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108887#comment-14108887
 ] 

Chengxiang Li commented on SPARK-2633:
--

I would start to work on this issue, for better isolation, this JIRA would 
focus on spark listener API enhancement, and I've created SPARK-3199 to track 
native Spark listener API implementation.

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-08-28 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113611#comment-14113611
 ] 

Chengxiang Li commented on SPARK-2633:
--

It's quite subjective I think, like Hive on MR display job progress by task 
finished percentage, while Hive on Tez display job progress with exact 
running/failed/finished task number. I think it's better we collect more detail 
job status info while it does not introduce much extra effort.

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-08-29 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-

Attachment: (was: Spark listener enhancement for Hive on Spark job 
monitor and statistic.docx)

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-08-29 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-

Attachment: Spark listener enhancement for Hive on Spark job monitor and 
statistic.docx

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2895) Support mapPartitionsWithContext in Spark Java API

2014-08-29 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116221#comment-14116221
 ] 

Chengxiang Li commented on SPARK-2895:
--

pull request:[https://github.com/apache/spark/pull/2194]

> Support mapPartitionsWithContext in Spark Java API
> --
>
> Key: SPARK-2895
> URL: https://issues.apache.org/jira/browse/SPARK-2895
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>  Labels: hive
>
> This is a requirement from Hive on Spark, mapPartitionsWithContext only 
> exists in Spark Scala API, we expect to access from Spark Java API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2895) Support mapPartitionsWithContext in Spark Java API

2014-09-01 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2895:
-
Description: 
This is a requirement from Hive on Spark, mapPartitionsWithContext only exists 
in Spark Scala API, we expect to access from Spark Java API. 
For HIVE-7627, HIVE-7843, Hive operators which are invoked in mapPartitions 
closure need to get taskId.

  was:This is a requirement from Hive on Spark, mapPartitionsWithContext only 
exists in Spark Scala API, we expect to access from Spark Java API.


> Support mapPartitionsWithContext in Spark Java API
> --
>
> Key: SPARK-2895
> URL: https://issues.apache.org/jira/browse/SPARK-2895
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: hive
>
> This is a requirement from Hive on Spark, mapPartitionsWithContext only 
> exists in Spark Scala API, we expect to access from Spark Java API. 
> For HIVE-7627, HIVE-7843, Hive operators which are invoked in mapPartitions 
> closure need to get taskId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-09-01 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-
Attachment: Spark listener enhancement for Hive on Spark job monitor and 
statistic.docx

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-09-01 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-
Attachment: (was: Spark listener enhancement for Hive on Spark job 
monitor and statistic.docx)

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting & event listener API

2014-09-04 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121085#comment-14121085
 ] 

Chengxiang Li commented on SPARK-2321:
--

I collect some hive side requirement here, which should be helpful for spark 
job status and statistic API design.

Hive should be able to get the following job status information through Spark 
job status API.
1. job identifier
2. current job execution state, should include RUNNING/SUCCEEDED/FAILED/KILLED.
3. running/failed/killed/total task number on job level.
4. stage identifier
5. stage state, should include RUNNING/SUCCEEDED/FAILED/KILLED
6. running/failed/killed/total task number on stage level.

MR/Tez use Counter to collect statistic information, similiar to MR/Tez 
Counter, it would be better if Spark job statistic API organize statistic 
information with:
1. group same kind statistic information by groupName.
2. displayName for both group and statistic information which would uniform 
print string for frontend(Web UI/Hive CLI/...).


> Design a proper progress reporting & event listener API
> ---
>
> Key: SPARK-2321
> URL: https://issues.apache.org/jira/browse/SPARK-2321
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, Spark Core
>Affects Versions: 1.0.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
>
> This is a ticket to track progress on redesigning the SparkListener and 
> JobProgressListener API.
> There are multiple problems with the current design, including:
> 0. I'm not sure if the API is usable in Java (there are at least some enums 
> we used in Scala and a bunch of case classes that might complicate things).
> 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
> attention to it yet. Something as important as progress reporting deserves a 
> more stable API.
> 2. There is no easy way to connect jobs with stages. Similarly, there is no 
> easy way to connect job groups with jobs / stages.
> 3. JobProgressListener itself has no encapsulation at all. States can be 
> arbitrarily mutated by external programs. Variable names are sort of randomly 
> decided and inconsistent. 
> We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting & event listener API

2014-09-04 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121173#comment-14121173
 ] 

Chengxiang Li commented on SPARK-2321:
--

I'm not sure whether i understand you right, here is my thought about the API 
design:
# The JobStatus/JobStatistic API only contains getter method.
# JobProgressListener contains variables of JobStatusImpl/JobStatisticImpl.
# DagScheduler post events to JobProgressListener through listener bus.
# Caller get JobStatusImpl/JobStatisticImpl from JobProgressListener with 
updated state.

So i think it should be a pull style API.

> Design a proper progress reporting & event listener API
> ---
>
> Key: SPARK-2321
> URL: https://issues.apache.org/jira/browse/SPARK-2321
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, Spark Core
>Affects Versions: 1.0.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
>
> This is a ticket to track progress on redesigning the SparkListener and 
> JobProgressListener API.
> There are multiple problems with the current design, including:
> 0. I'm not sure if the API is usable in Java (there are at least some enums 
> we used in Scala and a bunch of case classes that might complicate things).
> 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
> attention to it yet. Something as important as progress reporting deserves a 
> more stable API.
> 2. There is no easy way to connect jobs with stages. Similarly, there is no 
> easy way to connect job groups with jobs / stages.
> 3. JobProgressListener itself has no encapsulation at all. States can be 
> arbitrarily mutated by external programs. Variable names are sort of randomly 
> decided and inconsistent. 
> We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3543) Write TaskContext in Java and expose it through a static accessor

2014-09-15 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135016#comment-14135016
 ] 

Chengxiang Li commented on SPARK-3543:
--

I think this would solve SPARK-2895 as well, glad to have it.

> Write TaskContext in Java and expose it through a static accessor
> -
>
> Key: SPARK-3543
> URL: https://issues.apache.org/jira/browse/SPARK-3543
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Prashant Sharma
>Priority: Critical
>
> Right now we have these xWithContext methods and it's a bit awkward (for 
> instance, we don't support accessing taskContext from a normal map or filter 
> operation). I'd propose the following
> 1. Re-write TaskContext in Java - it's a simple class. It can still refer to 
> the scala version of TaskMetrics.
> 2. Have a static method `TaskContext.get()` which will return the current 
> in-scope TaskContext. Under the hood this uses a thread local variable 
> similar to SparkEnv that the Executor sets.
> 3. Deprecate all of the existing xWithContext methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting & event listener API

2014-09-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143191#comment-14143191
 ] 

Chengxiang Li commented on SPARK-2321:
--

I agree that a stable, immutable, and Java-friendly *info classes should be 
part of this new API design, and between register a new private SparkListener 
to collect JobInfo and get it from SparkContext, I think there are 2 more 
issues which should be resolved.
# The TaskInfo/StageInfo should be collected by job, while current Stage/Task 
SparkListener events does not include any job id info, Scheduler have the 
information to connect TaskInfo/StageInfo with job, maybe we should redesign 
the SparkListener event API, and add job id info into Stage/Task event in 
Scheduler before post it to listener bus.
# get job id after submit job and get job info by job id. Currently we could 
only get job id through a very limited way, execute spark async actions with 
SimpleFutureAction, 

> Design a proper progress reporting & event listener API
> ---
>
> Key: SPARK-2321
> URL: https://issues.apache.org/jira/browse/SPARK-2321
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, Spark Core
>Affects Versions: 1.0.0
>Reporter: Reynold Xin
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a ticket to track progress on redesigning the SparkListener and 
> JobProgressListener API.
> There are multiple problems with the current design, including:
> 0. I'm not sure if the API is usable in Java (there are at least some enums 
> we used in Scala and a bunch of case classes that might complicate things).
> 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
> attention to it yet. Something as important as progress reporting deserves a 
> more stable API.
> 2. There is no easy way to connect jobs with stages. Similarly, there is no 
> easy way to connect job groups with jobs / stages.
> 3. JobProgressListener itself has no encapsulation at all. States can be 
> arbitrarily mutated by external programs. Variable names are sort of randomly 
> decided and inconsistent. 
> We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-1442) Add Window function support

2014-04-08 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-1442:


 Summary: Add Window function support
 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li


similiar to Hive, add window function support for catalyst.
https://issues.apache.org/jira/browse/HIVE-4197
https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2633) support register spark listener to listener bus with Java API

2014-07-22 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-2633:


 Summary: support register spark listener to listener bus with Java 
API
 Key: SPARK-2633
 URL: https://issues.apache.org/jira/browse/SPARK-2633
 Project: Spark
  Issue Type: New Feature
  Components: Java API
Reporter: Chengxiang Li


Currently user can only register spark listener with Scala API, we should add 
this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2633) support register spark listener to listener bus with Java API

2014-07-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071310#comment-14071310
 ] 

Chengxiang Li commented on SPARK-2633:
--

cc [~rxin] [~xuefuz]

> support register spark listener to listener bus with Java API
> -
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-07-22 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-2636:


 Summary: no where to get job identifier while submit spark job 
through spark API
 Key: SPARK-2636
 URL: https://issues.apache.org/jira/browse/SPARK-2636
 Project: Spark
  Issue Type: New Feature
Reporter: Chengxiang Li


In Hive on Spark, we want to track spark job status through Spark API, the 
basic idea is as following:
# create an hive-specified spark listener and register it to spark listener bus.
# hive-specified spark listener generate job status by spark listener events.
# hive driver track job status through hive-specified spark listener. 
the current problem is that hive driver need job identifier to track specified 
job status through spark listener, but there is no spark API to get job 
identifier(like job id) while submit spark job.
I think other project whoever try to track job status with spark API would 
suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-07-22 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2636:
-

Description: 
In Hive on Spark, we want to track spark job status through Spark API, the 
basic idea is as following:
# create an hive-specified spark listener and register it to spark listener bus.
# hive-specified spark listener generate job status by spark listener events.
# hive driver track job status through hive-specified spark listener. 
the current problem is that hive driver need job identifier to track specified 
job status through spark listener, but there is no spark API to get job 
identifier(like job id) while submit spark job.

I think other project whoever try to track job status with spark API would 
suffer from this as well.

  was:
In Hive on Spark, we want to track spark job status through Spark API, the 
basic idea is as following:
# create an hive-specified spark listener and register it to spark listener bus.
# hive-specified spark listener generate job status by spark listener events.
# hive driver track job status through hive-specified spark listener. 
the current problem is that hive driver need job identifier to track specified 
job status through spark listener, but there is no spark API to get job 
identifier(like job id) while submit spark job.
I think other project whoever try to track job status with spark API would 
suffer from this as well.


> no where to get job identifier while submit spark job through spark API
> ---
>
> Key: SPARK-2636
> URL: https://issues.apache.org/jira/browse/SPARK-2636
> Project: Spark
>  Issue Type: New Feature
>Reporter: Chengxiang Li
>
> In Hive on Spark, we want to track spark job status through Spark API, the 
> basic idea is as following:
> # create an hive-specified spark listener and register it to spark listener 
> bus.
> # hive-specified spark listener generate job status by spark listener events.
> # hive driver track job status through hive-specified spark listener. 
> the current problem is that hive driver need job identifier to track 
> specified job status through spark listener, but there is no spark API to get 
> job identifier(like job id) while submit spark job.
> I think other project whoever try to track job status with spark API would 
> suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-07-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071311#comment-14071311
 ] 

Chengxiang Li commented on SPARK-2636:
--

cc [~rxin] [~xuefuz]

> no where to get job identifier while submit spark job through spark API
> ---
>
> Key: SPARK-2636
> URL: https://issues.apache.org/jira/browse/SPARK-2636
> Project: Spark
>  Issue Type: New Feature
>Reporter: Chengxiang Li
>
> In Hive on Spark, we want to track spark job status through Spark API, the 
> basic idea is as following:
> # create an hive-specified spark listener and register it to spark listener 
> bus.
> # hive-specified spark listener generate job status by spark listener events.
> # hive driver track job status through hive-specified spark listener. 
> the current problem is that hive driver need job identifier to track 
> specified job status through spark listener, but there is no spark API to get 
> job identifier(like job id) while submit spark job.
> I think other project whoever try to track job status with spark API would 
> suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-07-22 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2636:
-

Description: 
In Hive on Spark, we want to track spark job status through Spark API, the 
basic idea is as following:
# create an hive-specified spark listener and register it to spark listener bus.
# hive-specified spark listener generate job status by spark listener events.
# hive driver track job status through hive-specified spark listener. 

the current problem is that hive driver need job identifier to track specified 
job status through spark listener, but there is no spark API to get job 
identifier(like job id) while submit spark job.

I think other project whoever try to track job status with spark API would 
suffer from this as well.

  was:
In Hive on Spark, we want to track spark job status through Spark API, the 
basic idea is as following:
# create an hive-specified spark listener and register it to spark listener bus.
# hive-specified spark listener generate job status by spark listener events.
# hive driver track job status through hive-specified spark listener. 
the current problem is that hive driver need job identifier to track specified 
job status through spark listener, but there is no spark API to get job 
identifier(like job id) while submit spark job.

I think other project whoever try to track job status with spark API would 
suffer from this as well.


> no where to get job identifier while submit spark job through spark API
> ---
>
> Key: SPARK-2636
> URL: https://issues.apache.org/jira/browse/SPARK-2636
> Project: Spark
>  Issue Type: New Feature
>Reporter: Chengxiang Li
>
> In Hive on Spark, we want to track spark job status through Spark API, the 
> basic idea is as following:
> # create an hive-specified spark listener and register it to spark listener 
> bus.
> # hive-specified spark listener generate job status by spark listener events.
> # hive driver track job status through hive-specified spark listener. 
> the current problem is that hive driver need job identifier to track 
> specified job status through spark listener, but there is no spark API to get 
> job identifier(like job id) while submit spark job.
> I think other project whoever try to track job status with spark API would 
> suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2633) support register spark listener to listener bus with Java API

2014-07-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071402#comment-14071402
 ] 

Chengxiang Li commented on SPARK-2633:
--

For Hive job status monitor, spark listener monitor less job status than 
expected:

current 
* monitor events: onJobStart, onJobEnd(JobSucceeded/JobFailed)
* available status: Started/Succeeded/Failed

expect: 
* monitor events: +onJobSubmitted, onJobStart, 
onJobEnd(JobSucceeded/JobFailed/+JobKilled)
* available status: Submitted/Started/Succeeded/Failed/Killed

> support register spark listener to listener bus with Java API
> -
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2633) support register spark listener to listener bus with Java API

2014-07-23 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071478#comment-14071478
 ] 

Chengxiang Li commented on SPARK-2633:
--

add 2 more:
# StageInfo class is not well designed to get stage state, should contain 
something like StageEndReason.
# it would be lovely if spark java API could transfer scala collection into 
java collection, not by user. for example, stageIds: Seq\[Int] in 
SparkListenerJobStart.

> support register spark listener to listener bus with Java API
> -
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2633) support register spark listener to listener bus with Java API

2014-07-23 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072807#comment-14072807
 ] 

Chengxiang Li commented on SPARK-2633:
--

Thanks [~vanzin], that's a very good point. Stand at the side of Hive job 
monitor, i think we may need 2 kinds of support from spark:
# Full feature support of job monitor by spark listener, as i mentioned in 
previous comments, we are unable to fetch enough messages by current spark 
listener yet.
# Java friendly API like what spark has done with RDD, instead of just adding a 
proxy method to add spark listener in JavaSparkContext.

#1 is needed for hive on spark job monitor feature.
#2 is not neccessarily required as scala style spark listener is workable 
through not graceful in java, but it would be definite expected if spark can 
supply a java native spark listener API. 

> support register spark listener to listener bus with Java API
> -
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2633) support register spark listener to listener bus with Java API

2014-07-30 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080425#comment-14080425
 ] 

Chengxiang Li commented on SPARK-2633:
--

Thanks, [~vanzin] 
{quote}
Registering listeners through Java API: although not super-clean, it can be 
done today easily. SparkListener is a pretty clean interface that should be 
easily implementable in Java-land. You can register it using 
JavaSparkContext.sc().addSparkListener(). Is that enough?
{quote}
it's workable, but maybe add a proxy method such as 
JavaSparkContext.addSparkListener() is more meaningful for java spark listener 
API, what do you think?
{quote}
Events: SparkListenerJobStart is actually issued at job submission time (see 
DAGScheduler.scala::handleJobSubmitted, DAGScheduler.scala::submitJob. Not sure 
there's any other good substitute for it; the job has technically started - the 
DAGScheduler is processing it, and at some point might start executing tasks. I 
guess it depends on what is it that you're trying to achieve here...
{quote}
Previously i assumed that spark job has submitted state before running, while 
it seems does not. DAGScheduler would break up spark job into tasks after 
submit job and add to pending task list, waiting executor offer resource to 
execute these tasks. So spark job wait for running in task level instead of job 
level. For “TaskStarted” and "TaskSubmitted", do you think it's reasonable to 
use "TaskSubmitted" to represent the task state between task get 
submitted(TaskSchedulerImpl.scala::submitTasks()) and task get resource 
allocated(TaskSchedulerImpl.scala::resourceOffers())?

Besides, current StageInfo is hold too little information, we may need to add 
state object  inside like StageEndReason(success/failed/killed/resubmitted) to 
represent different kinds of stage end info..


> support register spark listener to listener bus with Java API
> -
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2633) support register spark listener to listener bus with Java API

2014-07-30 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-

Attachment: Spark listener enhancement for Hive on Spark job monitor and 
statistic.docx

I add a doc to collect requirement from hive on spark side, it may looks mussy 
in the comments. we could keep on discussing based on this file.

> support register spark listener to listener bus with Java API
> -
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-08-04 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085674#comment-14085674
 ] 

Chengxiang Li commented on SPARK-2636:
--

Thanks, [~vanzin]. Hive on Spark mainly try to track job level status which 
could be collected by job/stage/task events, so perhaps environment level 
events does not help a lot here.  Currently our problem is that there is no 
Spark API to get job identifier, Spark generate jobId internally while submit 
job, it just not exposed in API level. 
How would you expose job id in the PR?

> no where to get job identifier while submit spark job through spark API
> ---
>
> Key: SPARK-2636
> URL: https://issues.apache.org/jira/browse/SPARK-2636
> Project: Spark
>  Issue Type: New Feature
>Reporter: Chengxiang Li
>
> In Hive on Spark, we want to track spark job status through Spark API, the 
> basic idea is as following:
> # create an hive-specified spark listener and register it to spark listener 
> bus.
> # hive-specified spark listener generate job status by spark listener events.
> # hive driver track job status through hive-specified spark listener. 
> the current problem is that hive driver need job identifier to track 
> specified job status through spark listener, but there is no spark API to get 
> job identifier(like job id) while submit spark job.
> I think other project whoever try to track job status with spark API would 
> suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-08-05 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087102#comment-14087102
 ] 

Chengxiang Li commented on SPARK-2636:
--

{quote}
There are two ways I think. One is for DAGScheduler.runJob to return an integer 
(or long) id for the job. An alternative, which I think is better and relates 
to SPARK-2321, is for runJob to return some Job object that has information 
about the id and can be queried about progress.
{quote}
DAGScheduler is Spark internal class, User can hardly use it directly. I like 
your second idea,  return a Job info object while submit spark job in 
SparkContext(JavaSparkContext in this case) or RDD level. Actually 
AsyncRDDActions has done part of this work, I think it maybe a good place to 
fix this issue.

> no where to get job identifier while submit spark job through spark API
> ---
>
> Key: SPARK-2636
> URL: https://issues.apache.org/jira/browse/SPARK-2636
> Project: Spark
>  Issue Type: New Feature
>Reporter: Chengxiang Li
>
> In Hive on Spark, we want to track spark job status through Spark API, the 
> basic idea is as following:
> # create an hive-specified spark listener and register it to spark listener 
> bus.
> # hive-specified spark listener generate job status by spark listener events.
> # hive driver track job status through hive-specified spark listener. 
> the current problem is that hive driver need job identifier to track 
> specified job status through spark listener, but there is no spark API to get 
> job identifier(like job id) while submit spark job.
> I think other project whoever try to track job status with spark API would 
> suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-2895) Support mapPartitionsWithContext in Spark Java API

2014-08-06 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-2895:


 Summary: Support mapPartitionsWithContext in Spark Java API
 Key: SPARK-2895
 URL: https://issues.apache.org/jira/browse/SPARK-2895
 Project: Spark
  Issue Type: New Feature
  Components: Java API
Reporter: Chengxiang Li


This is a requirement from Hive on Spark, mapPartitionsWithContext only exists 
in Spark Scala API, we expect to access from Spark Java API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2895) Support mapPartitionsWithContext in Spark Java API

2014-08-06 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088736#comment-14088736
 ] 

Chengxiang Li commented on SPARK-2895:
--

cc [~rxin] [~brocknoland] [~szehon]

> Support mapPartitionsWithContext in Spark Java API
> --
>
> Key: SPARK-2895
> URL: https://issues.apache.org/jira/browse/SPARK-2895
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>
> This is a requirement from Hive on Spark, mapPartitionsWithContext only 
> exists in Spark Scala API, we expect to access from Spark Java API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2895) Support mapPartitionsWithContext in Spark Java API

2014-08-06 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088895#comment-14088895
 ] 

Chengxiang Li commented on SPARK-2895:
--

{quote}
Can we add the label "hive" to all the tickets related to the Hive on Spark 
project in the future? This should be fairly easy to do.
{quote}

Thanks, [~rxin], that make sense, i would label all hive on spark related 
issues.

> Support mapPartitionsWithContext in Spark Java API
> --
>
> Key: SPARK-2895
> URL: https://issues.apache.org/jira/browse/SPARK-2895
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>  Labels: hive
>
> This is a requirement from Hive on Spark, mapPartitionsWithContext only 
> exists in Spark Scala API, we expect to access from Spark Java API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2633) support register spark listener to listener bus with Java API

2014-08-06 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2633:
-

Labels: hive  (was: )

> support register spark listener to listener bus with Java API
> -
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Currently user can only register spark listener with Scala API, we should add 
> this feature to Java API as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-08-06 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2636:
-

Component/s: Java API

> no where to get job identifier while submit spark job through spark API
> ---
>
> Key: SPARK-2636
> URL: https://issues.apache.org/jira/browse/SPARK-2636
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>  Labels: hive
>
> In Hive on Spark, we want to track spark job status through Spark API, the 
> basic idea is as following:
> # create an hive-specified spark listener and register it to spark listener 
> bus.
> # hive-specified spark listener generate job status by spark listener events.
> # hive driver track job status through hive-specified spark listener. 
> the current problem is that hive driver need job identifier to track 
> specified job status through spark listener, but there is no spark API to get 
> job identifier(like job id) while submit spark job.
> I think other project whoever try to track job status with spark API would 
> suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-08-06 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated SPARK-2636:
-

Labels: hive  (was: )

> no where to get job identifier while submit spark job through spark API
> ---
>
> Key: SPARK-2636
> URL: https://issues.apache.org/jira/browse/SPARK-2636
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>  Labels: hive
>
> In Hive on Spark, we want to track spark job status through Spark API, the 
> basic idea is as following:
> # create an hive-specified spark listener and register it to spark listener 
> bus.
> # hive-specified spark listener generate job status by spark listener events.
> # hive driver track job status through hive-specified spark listener. 
> the current problem is that hive driver need job identifier to track 
> specified job status through spark listener, but there is no spark API to get 
> job identifier(like job id) while submit spark job.
> I think other project whoever try to track job status with spark API would 
> suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5377) Dynamically add jar into Spark Driver's classpath.

2015-01-22 Thread Chengxiang Li (JIRA)
Chengxiang Li created SPARK-5377:


 Summary: Dynamically add jar into Spark Driver's classpath.
 Key: SPARK-5377
 URL: https://issues.apache.org/jira/browse/SPARK-5377
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Chengxiang Li


Spark support dynamically add jar to executor classpath through 
SparkContext::addJar(), while it does not support dynamically add jar into 
driver classpath. In most case(if not all the case), user dynamically add jar 
with SparkContext::addJar()  because some classes from the jar would be 
referred in upcoming Spark job, which means the classes need to be loaded in 
Spark driver side either,e.g during serialization. I think it make sense to add 
an API to add jar into driver classpath, or just make it available in 
SparkContext::addJar(). HIVE-9410 is a real case from Hive on Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5377) Dynamically add jar into Spark Driver's classpath.

2015-01-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288783#comment-14288783
 ] 

Chengxiang Li commented on SPARK-5377:
--

cc [~xuefuz], [~rxin], [~Grace Huang].

> Dynamically add jar into Spark Driver's classpath.
> --
>
> Key: SPARK-5377
> URL: https://issues.apache.org/jira/browse/SPARK-5377
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Chengxiang Li
>
> Spark support dynamically add jar to executor classpath through 
> SparkContext::addJar(), while it does not support dynamically add jar into 
> driver classpath. In most case(if not all the case), user dynamically add jar 
> with SparkContext::addJar()  because some classes from the jar would be 
> referred in upcoming Spark job, which means the classes need to be loaded in 
> Spark driver side either,e.g during serialization. I think it make sense to 
> add an API to add jar into driver classpath, or just make it available in 
> SparkContext::addJar(). HIVE-9410 is a real case from Hive on Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5439) Expose yarn app id for yarn mode

2015-01-28 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294872#comment-14294872
 ] 

Chengxiang Li commented on SPARK-5439:
--

I think the gap here is that, when launch a Hive CLI with Spark deployed on 
YARN, from the YARN webUI page, the user want to know which YARN application is 
allocated for his hive session, then he can check the status of his application 
through ApplicationMaster tracking UI. 
[~vanzin], is the YARN application ID same as SparkContext applicationID? From 
my experience, seems not. The YARN web UI shows that the YARN application ID 
schema is like:application_\*, and SparkContext applicationID is set at 
SchedulerBackend#26 with schema spark-application-*.

> Expose yarn app id for yarn mode
> 
>
> Key: SPARK-5439
> URL: https://issues.apache.org/jira/browse/SPARK-5439
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: bc Wong
>
> When submitting Spark apps on YARN, the caller should be able to get back the 
> YARN app ID programmatically. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org