[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68238365
  
  [Test build #24855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24855/consoleFull)
 for   PR 3820 at commit 
[`d44831a`](https://github.com/apache/spark/commit/d44831a2462b2c049b0222fbb7b8e08023d1f67c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-28 Thread adrian-wang
GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3820

[SPARK-4987] [SQL] parquet timestamp type support



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark parquettimestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3820


commit d44831a2462b2c049b0222fbb7b8e08023d1f67c
Author: Daoyuan Wang 
Date:   2014-12-29T07:41:13Z

parquet timestamp type support




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Added Java serialization util functions back i...

2014-12-28 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3792#discussion_r2230
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/util/JavaUtils.java ---
@@ -41,6 +41,34 @@
 public class JavaUtils {
   private static final Logger logger = 
LoggerFactory.getLogger(JavaUtils.class);
 
+  /** Deserialize a byte array using Java serialization. */
+  public static  T deserialize(byte[] bytes) {
+try {
+  ObjectInputStream is = new ObjectInputStream(new 
ByteArrayInputStream(bytes));
+  Object out = is.readObject();
+  is.close();
+  return (T) out;
+} catch (ClassNotFoundException e) {
+  throw new RuntimeException("Could not deserialize object", e);
--- End diff --

IllegalStateException doesn't seem to be an accurate description of this, 
does it? Of course, you can technically label everything as an illegal state ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-28 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68237654
  
@jkbradley Is there any problem you concern? Is this ready to merge? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4984][Core][UI] Adding a pop-up contain...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68236040
  
  [Test build #24854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24854/consoleFull)
 for   PR 3819 at commit 
[`8fe74b0`](https://github.com/apache/spark/commit/8fe74b03e63e36d370ba61946181194b1f0c84a2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4984][Core][UI] Adding a pop-up contain...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68235708
  
  [Test build #24853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24853/consoleFull)
 for   PR 3819 at commit 
[`4909bb4`](https://github.com/apache/spark/commit/4909bb4e9d3c4fac9c30a67c7a3e30f232e934c9).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4984][Core][UI] Adding a pop-up contain...

2014-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68235709
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24853/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4984][Core][UI] Adding a pop-up contain...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68235678
  
  [Test build #24853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24853/consoleFull)
 for   PR 3819 at commit 
[`4909bb4`](https://github.com/apache/spark/commit/4909bb4e9d3c4fac9c30a67c7a3e30f232e934c9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4871][SQL] Show sql statement in spark ...

2014-12-28 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3718#issuecomment-68235672
  
Filed a PR to add pop-up for full job description #3819 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4984][Core][UI] Adding a pop-up contain...

2014-12-28 Thread scwf
GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/3819

[SPARK-4984][Core][UI] Adding a pop-up containing the full for job 
description when it is very long

In some case the job description will be very long, such as a long sql. 
refer to #3718
This PR add a pop-up for job description when it is long. 

![image](https://cloud.githubusercontent.com/assets/7018048/5566579/2f45fb7a-8f68-11e4-87b2-a014269afaff.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark popup-descrip-ui

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3819


commit 4593db95632d57445b3912532a0e2d082671c232
Author: wangfei 
Date:   2014-12-29T04:00:54Z

draft for pop up the description of job

commit 9e2f97e0eb2624b3da2ff64793d9b01015ac7287
Author: wangfei 
Date:   2014-12-29T05:54:56Z

minor improvement

commit 4909bb4e9d3c4fac9c30a67c7a3e30f232e934c9
Author: wangfei 
Date:   2014-12-29T06:06:48Z

adding pointer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...

2014-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3818#issuecomment-68230711
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2075][Core] Make the compiler generate ...

2014-12-28 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/3740#issuecomment-68230642
  
I learn a lot when review this PR,thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...

2014-12-28 Thread wangxiaojing
GitHub user wangxiaojing opened a pull request:

https://github.com/apache/spark/pull/3818

[SPARK-4982][DOC]The `spark.ui.retainedJobs` meaning is wrong in `Spark UI` 
configuration



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangxiaojing/spark SPARK-4982

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3818


commit fe2ad5f18617486ff090ae4498117324b7d4be75
Author: wangxiaojing 
Date:   2014-12-29T03:44:14Z

change stages to jobs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-68229094
  
  [Test build #24852 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24852/consoleFull)
 for   PR 3564 at commit 
[`00e2b93`](https://github.com/apache/spark/commit/00e2b93505a2fe973eb75c0306ad6acaddcf9685).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2014-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-68229098
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24852/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-68228528
  
  [Test build #24852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24852/consoleFull)
 for   PR 3564 at commit 
[`00e2b93`](https://github.com/apache/spark/commit/00e2b93505a2fe973eb75c0306ad6acaddcf9685).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4465] runAsSparkUser doesn't affect Tas...

2014-12-28 Thread jongyoul
Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3741#issuecomment-68227400
  
@tnachen switch_user is disabled, a user id running mesos is `hdfs` both 
master and slave, and a user id running apps are `1001079`, `rake`. And I've 
run simple app - that I made - for testing reading data from hdfs, writing data 
to hdfs, with groupBy function. We use spark as multi user environment for 
submitting several apps running concurrently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

2014-12-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3810#discussion_r22300276
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -55,13 +57,9 @@ private[spark] class Client(
* 
-
 */
 
   /**
-   * Submit an application running our ApplicationMaster to the 
ResourceManager.
-   *
-   * The stable Yarn API provides a convenience method 
(YarnClient#createApplication) for
-   * creating applications and setting up the application submission 
context. This was not
-   * available in the alpha API.
+   * Create an application running our ApplicationMaster to the 
ResourceManager.
*/
-  override def submitApplication(): ApplicationId = {
+  override def createApplication(): ApplicationId = {
--- End diff --

There are several new API methods and changes here. I don't think they're 
explained or motivated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

2014-12-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3810#discussion_r22300275
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -78,8 +78,8 @@ class DAGSchedulerSuite extends 
TestKit(ActorSystem("DAGSchedulerSuite")) with F
   val taskScheduler = new TaskScheduler() {
 override def rootPool: Pool = null
 override def schedulingMode: SchedulingMode = SchedulingMode.NONE
-override def start() = {}
-override def stop() = {}
+override def start() = { started.compareAndSet(false, true) }
--- End diff --

`compareAndSet` is not needed here. Just `set`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

2014-12-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3810#discussion_r22300274
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -333,9 +333,15 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   new SparkException("DAGScheduler cannot be initialized due to 
%s".format(e.getMessage))
   }
 
-  // start TaskScheduler after taskScheduler sets DAGScheduler reference 
in DAGScheduler's
-  // constructor
-  taskScheduler.start()
+  if (conf.getBoolean("spark.scheduler.app.slowstart", false) && master == 
"yarn-client") {
--- End diff --

There is a new undocumented system property for this? why only client mode?
My overall impression is that this adds different code paths and behaviors 
in different modes for little gain. I am not sure the description makes a case 
that it's significant enough to bother


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

2014-12-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-68219056
  
@JoshRosen I think it's a small pain to add to the POM. You can bind a 
different execution of the Maven clean plugin to `test` or `test-compile`. I 
don't think it's hard, just more stuff in the POM. I had punted on it but if 
anyone votes for adding it I will do so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-12-28 Thread derrickburns
Github user derrickburns commented on the pull request:

https://github.com/apache/spark/pull/2634#issuecomment-68218597
  
That would be great!

On Sat, Dec 27, 2014 at 12:59 PM, Nicholas Chammas  wrote:

> @mengxr  Now that 1.2.0 is out, can we
> schedule a rough timeframe for reviewing this patch?
>
> —
> Reply to this email directly or view it on GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4417] New API: sample RDD to fixed numb...

2014-12-28 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/3723#issuecomment-68216759
  
My biggest problem with this is that, while the existing `sample` is an 
action, `sampleByCount` is another one of those unholy beasts that is neither 
an action nor a transformation -- meaning that, while it transforms an RDD into 
another RDD, it isn't lazy while doing so, but rather embeds several actions 
(`count`) and makes use of another unholy beast (`zipWithIndex`), all of which 
means that invoking `sampleByCount` eagerly launches several jobs in order to 
create the new RDD.

This is by no means the only eager transformation (or whatever we end up 
calling these unholy beasts), since there is a handful of others that already 
exist in Spark; but I am really hesitant to add another.  What we need is a 
larger strategy and re-organization to properly handle, name and document eager 
transformations, but that is well beyond the scope of this single PR.  In the 
meantime, eager transformations are just conveniences (inconveniences if you 
are trying to launch jobs asynchronously) that packages up one or more actions. 
 They can always be broken up into multiple explicit and ordinary 
transformations and actions (as Sean was effectively suggesting earlier), so 
none of them are strictly necessary to achieve their functionality.

I'm really hesitant to add `sampleByCount` to the Spark API and thereby to 
the list of eager transformations that we need to somehow fix in the future.  
Perhaps a better way to handle such convenience packaging of transformations 
and actions on RDDs is to include them in [Spark 
Packages](http://spark-packages.org/). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4417] New API: sample RDD to fixed numb...

2014-12-28 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/3723#issuecomment-68211996
  
Hello, could anyone please provide any more feedback on this patch and 
ideally get this merged? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24851/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204978
  
  [Test build #24851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24851/consoleFull)
 for   PR 3778 at commit 
[`527e6ce`](https://github.com/apache/spark/commit/527e6cee23dca18c2f035c772f659394c3c700d5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24850/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204877
  
  [Test build #24850 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24850/consoleFull)
 for   PR 3778 at commit 
[`37022d1`](https://github.com/apache/spark/commit/37022d11cfcb9b316cf5cbe48db2ba7fd4b1f918).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204050
  
Updated. /cc @liancheng, any comments here? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread scwf
Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22297307
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -307,7 +309,29 @@ object BooleanSimplification extends Rule[LogicalPlan] 
{
   case (l, Literal(true, BooleanType)) => l
   case (Literal(false, BooleanType), _) => Literal(false)
   case (_, Literal(false, BooleanType)) => Literal(false)
-  case (_, _) => and
+  // a && a => a
+  case (l, r) if l fastEquals r => l
+  case (_, _) =>
+val lhsSet = splitDisjunctivePredicates(left).toSet
+val rhsSet = splitDisjunctivePredicates(right).toSet
+val common = lhsSet.intersect(rhsSet)
+val ldiff = lhsSet.diff(common)
+val rdiff = rhsSet.diff(common)
+if (common.size == 0) {
--- End diff --

to remove this if


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread scwf
Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22297306
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -316,7 +340,29 @@ object BooleanSimplification extends Rule[LogicalPlan] 
{
   case (_, Literal(true, BooleanType)) => Literal(true)
   case (Literal(false, BooleanType), r) => r
   case (l, Literal(false, BooleanType)) => l
-  case (_, _) => or
+  // a || a => a
+  case (l, r) if l fastEquals r => l
+  case (_, _) =>
+val lhsSet = splitConjunctivePredicates(left).toSet
+val rhsSet = splitConjunctivePredicates(right).toSet
+val common = lhsSet.intersect(rhsSet)
+val ldiff = lhsSet.diff(common)
+val rdiff = rhsSet.diff(common)
+if (common.size == 0) {
--- End diff --

to remove this if


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68203564
  
  [Test build #24851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24851/consoleFull)
 for   PR 3778 at commit 
[`527e6ce`](https://github.com/apache/spark/commit/527e6cee23dca18c2f035c772f659394c3c700d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68203451
  
  [Test build #24850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24850/consoleFull)
 for   PR 3778 at commit 
[`37022d1`](https://github.com/apache/spark/commit/37022d11cfcb9b316cf5cbe48db2ba7fd4b1f918).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org