[jira] [Commented] (SPARK-5377) Dynamically add jar into Spark Driver's classpath.

2018-02-23 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374841#comment-16374841 ] Xuefu Zhang commented on SPARK-5377: [~shay_elbaz] I think the issue was closed purely because no one

[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used

2018-02-09 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358759#comment-16358759 ] Xuefu Zhang commented on SPARK-22683: - On a side note, besides the name of the configuration that's

[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used

2018-02-07 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355634#comment-16355634 ] Xuefu Zhang commented on SPARK-22683: - +1 on the idea of including this. Also, +1 on renaming the

[jira] [Comment Edited] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-21 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300926#comment-16300926 ] Xuefu Zhang edited comment on SPARK-22765 at 12/22/17 4:43 AM: --- Did some

[jira] [Created] (SPARK-22870) Dynamic allocation should allow 0 idle time

2017-12-21 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-22870: --- Summary: Dynamic allocation should allow 0 idle time Key: SPARK-22870 URL: https://issues.apache.org/jira/browse/SPARK-22870 Project: Spark Issue Type:

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-21 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300926#comment-16300926 ] Xuefu Zhang commented on SPARK-22765: - Did some benchmarking with a set of 20 queries on upfront

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-20 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298634#comment-16298634 ] Xuefu Zhang commented on SPARK-22765: - bq: at least based on this one experiment up front allocation

[jira] [Comment Edited] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297914#comment-16297914 ] Xuefu Zhang edited comment on SPARK-22765 at 12/20/17 5:45 AM: --- Alright, I

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297914#comment-16297914 ] Xuefu Zhang commented on SPARK-22765: - Alright, I tested upfront allocation and its combinations with

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297645#comment-16297645 ] Xuefu Zhang commented on SPARK-22765: - Haven't got a chance to try upfront allocation. Tried one

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297529#comment-16297529 ] Xuefu Zhang commented on SPARK-22765: - {quote} SPARK-21656 and the dynamic allocation should handle

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297346#comment-16297346 ] Xuefu Zhang commented on SPARK-22765: - I'm not 100% positive, but that seems to be what I saw. I will

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297316#comment-16297316 ] Xuefu Zhang commented on SPARK-22765: - Actually I meant across parallel stages (those connected to

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-19 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297276#comment-16297276 ] Xuefu Zhang commented on SPARK-22765: - Hi [~CodingCat], yes, we adapted both #1 and #2, but we still

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-18 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295814#comment-16295814 ] Xuefu Zhang commented on SPARK-22765: - As an update, I managed to backport SPARK-21656, among a few

[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used

2017-12-18 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295770#comment-16295770 ] Xuefu Zhang commented on SPARK-22683: - I did some tests with the proposed change here and saw general

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-14 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292048#comment-16292048 ] Xuefu Zhang commented on SPARK-22765: - [~tgraves], I think it would help if SPARK-21656 can make a

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-13 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289726#comment-16289726 ] Xuefu Zhang commented on SPARK-22765: - Yes, we are using Hive on Spark. Our Spark version is 1.6.1,

[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used

2017-12-13 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289689#comment-16289689 ] Xuefu Zhang commented on SPARK-22683: - [~tgraves], I can speak on our use case, where same queries

[jira] [Comment Edited] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-13 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289676#comment-16289676 ] Xuefu Zhang edited comment on SPARK-22765 at 12/13/17 6:30 PM: --- Hi

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-13 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289676#comment-16289676 ] Xuefu Zhang commented on SPARK-22765: - Hi [~tgraves], Thanks for your input. In our busy, heavily

[jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-12 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288414#comment-16288414 ] Xuefu Zhang commented on SPARK-22765: - I wouldn't say that MR is static, at lease not static in

[jira] [Created] (SPARK-22765) Create a new executor allocation scheme based on that of MR

2017-12-12 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-22765: --- Summary: Create a new executor allocation scheme based on that of MR Key: SPARK-22765 URL: https://issues.apache.org/jira/browse/SPARK-22765 Project: Spark

[jira] [Commented] (SPARK-22683) Allow tuning the number of dynamically allocated executors wrt task number

2017-12-12 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288262#comment-16288262 ] Xuefu Zhang commented on SPARK-22683: - Hi [~jcuquemelle], Thanks for working on this and bringing up

[jira] [Commented] (SPARK-20640) Make rpc timeout and retry for shuffle registration configurable

2017-12-11 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286807#comment-16286807 ] Xuefu Zhang commented on SPARK-20640: - [~lyc], thanks for fixing this. I'm wondering if you have any

[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks

2017-06-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035519#comment-16035519 ] Xuefu Zhang commented on SPARK-20662: - I can understand the counter argument here if Spark is

[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks

2017-06-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035462#comment-16035462 ] Xuefu Zhang commented on SPARK-20662: - [~lyc] I'm talking about mapreduce.job.max.map, which is the

[jira] [Created] (SPARK-20662) Block jobs that have greater than a configured number of tasks

2017-05-08 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-20662: --- Summary: Block jobs that have greater than a configured number of tasks Key: SPARK-20662 URL: https://issues.apache.org/jira/browse/SPARK-20662 Project: Spark

[jira] [Commented] (SPARK-18769) Spark to be smarter about what the upper bound is and to restrict number of executor when dynamic allocation is enabled

2017-03-01 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891610#comment-15891610 ] Xuefu Zhang commented on SPARK-18769: - Just as fyi, the problem is real and happens when allocation

[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2016-02-16 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148684#comment-15148684 ] Xuefu Zhang commented on SPARK-2421: [~sowen], I saw you had closed this without giving any

[jira] [Commented] (SPARK-5377) Dynamically add jar into Spark Driver's classpath.

2016-02-16 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148675#comment-15148675 ] Xuefu Zhang commented on SPARK-5377: [~sowen], I saw you had closed this without giving any

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2015-03-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343977#comment-14343977 ] Xuefu Zhang commented on SPARK-3621: {quote} you can go a step further if you wanted

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2015-03-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343582#comment-14343582 ] Xuefu Zhang commented on SPARK-3621: For Hive's map join, we create a hash table out

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2015-03-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343655#comment-14343655 ] Xuefu Zhang commented on SPARK-3621: addFile() can take a HDFS file, for which case,

[jira] [Commented] (SPARK-3691) Provide a mini cluster for testing system built on Spark

2015-02-27 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340950#comment-14340950 ] Xuefu Zhang commented on SPARK-3691: Hive is using spark.master=local-cluster for unit

[jira] [Closed] (SPARK-3691) Provide a mini cluster for testing system built on Spark

2015-02-27 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang closed SPARK-3691. -- Resolution: Won't Fix Provide a mini cluster for testing system built on Spark

[jira] [Commented] (SPARK-5439) Expose yarn app id for yarn mode

2015-01-27 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294406#comment-14294406 ] Xuefu Zhang commented on SPARK-5439: Yeah. This is certainly something desirable in

[jira] [Commented] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2015-01-26 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292448#comment-14292448 ] Xuefu Zhang commented on SPARK-2688: Yeah. We don't need a syntactic suger, but a

[jira] [Commented] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2015-01-26 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292415#comment-14292415 ] Xuefu Zhang commented on SPARK-2688: #1 above is exactly what Hive needs badly. Need

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2015-01-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291145#comment-14291145 ] Xuefu Zhang commented on SPARK-3621: I'm not sure if I agree that this is not a

[jira] [Reopened] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2015-01-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reopened SPARK-3621: Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can

[jira] [Commented] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2015-01-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291134#comment-14291134 ] Xuefu Zhang commented on SPARK-2688: I think SPARK-3622 is related to this JIRA but

[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2015-01-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287901#comment-14287901 ] Xuefu Zhang commented on SPARK-3622: Unfortunately no. In Hive the problem is more

[jira] [Commented] (SPARK-5377) Dynamically add jar into Spark Driver's classpath.

2015-01-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288866#comment-14288866 ] Xuefu Zhang commented on SPARK-5377: cc [~sandyr]. Dynamically add jar into Spark

[jira] [Commented] (SPARK-1021) sortByKey() launches a cluster job when it shouldn't

2015-01-16 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281168#comment-14281168 ] Xuefu Zhang commented on SPARK-1021: This problem also occurred on Hive on Spark

[jira] [Commented] (SPARK-5080) Expose more cluster resource information to user

2015-01-07 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268786#comment-14268786 ] Xuefu Zhang commented on SPARK-5080: cc: [~sandyr] Expose more cluster resource

[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-30 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261436#comment-14261436 ] Xuefu Zhang commented on SPARK-4921: Some will be NODE_LOCAL, but others will be

[jira] [Commented] (SPARK-2387) Remove the stage barrier for better resource utilization

2014-12-24 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258626#comment-14258626 ] Xuefu Zhang commented on SPARK-2387: cc: [~sandyr] I think the purpose of this

[jira] [Created] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-22 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-4921: -- Summary: Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921

[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256015#comment-14256015 ] Xuefu Zhang commented on SPARK-4921: cc: [~lirui], [~sandyr] Performance issue

[jira] [Updated] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated SPARK-4921: --- Attachment: NO_PREF.patch Performance issue caused by TaskSetManager returning PROCESS_LOCAL for

[jira] [Commented] (SPARK-4687) SparkContext#addFile doesn't keep file folder information

2014-12-01 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230873#comment-14230873 ] Xuefu Zhang commented on SPARK-4687: [~jxiang], alternatively, would a new method,

[jira] [Commented] (SPARK-4567) Make SparkJobInfo and SparkStageInfo serializable

2014-11-24 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222929#comment-14222929 ] Xuefu Zhang commented on SPARK-4567: {quote} please don't set the FixVersion field.

[jira] [Commented] (SPARK-4440) Enhance the job progress API to expose more information

2014-11-23 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222549#comment-14222549 ] Xuefu Zhang commented on SPARK-4440: CC: [~sandyr], [~rxin] Enhance the job progress

[jira] [Created] (SPARK-4567) Make SparkJobInfo and SparkStageInfo serializable

2014-11-23 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-4567: -- Summary: Make SparkJobInfo and SparkStageInfo serializable Key: SPARK-4567 URL: https://issues.apache.org/jira/browse/SPARK-4567 Project: Spark Issue Type:

[jira] [Commented] (SPARK-4567) Make SparkJobInfo and SparkStageInfo serializable

2014-11-23 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222591#comment-14222591 ] Xuefu Zhang commented on SPARK-4567: CC: [~sandyr], [~rxin] Make SparkJobInfo and

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-23 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222593#comment-14222593 ] Xuefu Zhang commented on SPARK-2321: I have created SPARK-4567 to track the request of

[jira] [Created] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-4290: -- Summary: Provide an equivalent functionality of distributed cache as MR does Key: SPARK-4290 URL: https://issues.apache.org/jira/browse/SPARK-4290 Project: Spark

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201421#comment-14201421 ] Xuefu Zhang commented on SPARK-4290: CC: [~sandyr], [~rxin] Provide an equivalent

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201496#comment-14201496 ] Xuefu Zhang commented on SPARK-4290: Hi [~rxin], by out of box, do you mean

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201561#comment-14201561 ] Xuefu Zhang commented on SPARK-4290: Hi [~rxin], from the documentation of above java

[jira] [Commented] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201621#comment-14201621 ] Xuefu Zhang commented on SPARK-4290: Yes, SparkContext#addFile() seems to be what we

[jira] [Comment Edited] (SPARK-4290) Provide an equivalent functionality of distributed cache as MR does

2014-11-06 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201621#comment-14201621 ] Xuefu Zhang edited comment on SPARK-4290 at 11/7/14 5:37 AM: -

[jira] [Created] (SPARK-3691) Provide a mini cluster for testing system built on Spark

2014-09-25 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-3691: -- Summary: Provide a mini cluster for testing system built on Spark Key: SPARK-3691 URL: https://issues.apache.org/jira/browse/SPARK-3691 Project: Spark Issue

[jira] [Commented] (SPARK-3691) Provide a mini cluster for testing system built on Spark

2014-09-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148071#comment-14148071 ] Xuefu Zhang commented on SPARK-3691: cc [~sandyr] Provide a mini cluster for testing

[jira] [Comment Edited] (SPARK-3691) Provide a mini cluster for testing system built on Spark

2014-09-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148071#comment-14148071 ] Xuefu Zhang edited comment on SPARK-3691 at 9/25/14 6:21 PM: -

[jira] [Created] (SPARK-3693) Cached Hadoop RDD always return rows with the same value

2014-09-25 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-3693: -- Summary: Cached Hadoop RDD always return rows with the same value Key: SPARK-3693 URL: https://issues.apache.org/jira/browse/SPARK-3693 Project: Spark Issue

[jira] [Updated] (SPARK-3693) Cached Hadoop RDD always return rows with the same value

2014-09-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated SPARK-3693: --- Description: While trying RDD caching, it's found that caching a Hadoop RDD causes data correctness

[jira] [Commented] (SPARK-3693) Cached Hadoop RDD always return rows with the same value

2014-09-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148295#comment-14148295 ] Xuefu Zhang commented on SPARK-3693: cc [~rxin], [~sandyr] Cached Hadoop RDD always

[jira] [Commented] (SPARK-3693) Cached Hadoop RDD always return rows with the same value

2014-09-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148318#comment-14148318 ] Xuefu Zhang commented on SPARK-3693: Thanks, guys. We are fine with the workaround.

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2014-09-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143231#comment-14143231 ] Xuefu Zhang commented on SPARK-3621: In my limited understanding, to broadcast a

[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143921#comment-14143921 ] Xuefu Zhang commented on SPARK-3622: They are related but not exactly the same.

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2014-09-21 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142466#comment-14142466 ] Xuefu Zhang commented on SPARK-3621: I understand RDD is a concept existing only in

[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-21 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142881#comment-14142881 ] Xuefu Zhang commented on SPARK-3622: Thanks for your comments, [~pwendell]. I

[jira] [Comment Edited] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-21 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142881#comment-14142881 ] Xuefu Zhang edited comment on SPARK-3622 at 9/22/14 4:39 AM: -

[jira] [Created] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2014-09-20 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-3621: -- Summary: Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access Key: SPARK-3621 URL: https://issues.apache.org/jira/browse/SPARK-3621

[jira] [Created] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-20 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-3622: -- Summary: Provide a custom transformation that can output multiple RDDs Key: SPARK-3622 URL: https://issues.apache.org/jira/browse/SPARK-3622 Project: Spark

[jira] [Commented] (SPARK-2895) Support mapPartitionsWithContext in Spark Java API

2014-09-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118759#comment-14118759 ] Xuefu Zhang commented on SPARK-2895: Hi [~rxin], could you review [~chengxiang li]'s

[jira] [Commented] (SPARK-2741) Publish version of spark assembly which does not contain Hive

2014-07-29 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078813#comment-14078813 ] Xuefu Zhang commented on SPARK-2741: cc: [~rxin], [~sandyr] Publish version of spark

[jira] [Commented] (SPARK-2741) Publish version of spark assembly which does not contain Hive

2014-07-29 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078835#comment-14078835 ] Xuefu Zhang commented on SPARK-2741: I did see a profile about Hive. However, it seems

[jira] [Created] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2014-07-25 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-2688: -- Summary: Need a way to run multiple data pipeline concurrently Key: SPARK-2688 URL: https://issues.apache.org/jira/browse/SPARK-2688 Project: Spark Issue Type:

[jira] [Commented] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2014-07-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074368#comment-14074368 ] Xuefu Zhang commented on SPARK-2688: cc: [~rxin] [~sandyr] Need a way to run

[jira] [Updated] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2014-07-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated SPARK-2688: --- Description: Suppose we want to do the following data processing: {code} rdd1 - rdd2 - rdd3

[jira] [Updated] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2014-07-25 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated SPARK-2688: --- Description: Suppose we want to do the following data processing: {code} rdd1 - rdd2 - rdd3

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

2014-07-23 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072778#comment-14072778 ] Xuefu Zhang commented on SPARK-2420: Is shading guava in Spark build a reasonable

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

2014-07-18 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066600#comment-14066600 ] Xuefu Zhang commented on SPARK-2420: {quote} 2. For jetty, it was a problem with Hive

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

2014-07-16 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063998#comment-14063998 ] Xuefu Zhang commented on SPARK-2420: Thanks for your comments, [~srowen]. I mostly

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

2014-07-16 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064168#comment-14064168 ] Xuefu Zhang commented on SPARK-2420: As to guava conflict, HIVE-7387 has more details

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

2014-07-15 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063049#comment-14063049 ] Xuefu Zhang commented on SPARK-2420: [~rxin] As pointed above, Hive and its dependent

[jira] [Commented] (SPARK-2420) Change Spark build to minimize library conflicts

2014-07-11 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058757#comment-14058757 ] Xuefu Zhang commented on SPARK-2420: [~rxin] Is it possible to downgrade or shade

[jira] [Created] (SPARK-2420) Change Spark build to minimize library conflicts

2014-07-09 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created SPARK-2420: -- Summary: Change Spark build to minimize library conflicts Key: SPARK-2420 URL: https://issues.apache.org/jira/browse/SPARK-2420 Project: Spark Issue Type: Wish

[jira] [Commented] (SPARK-2421) Spark should treat writable as serializable for keys

2014-07-09 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056677#comment-14056677 ] Xuefu Zhang commented on SPARK-2421: CC [~rxin], [~hshreedharan] Spark should treat