date:20141229

[GitHub] spark pull request: Fix: 'Create table ..as select ..from..order b...

2014-12-29 Thread guowei2

GitHub user guowei2 opened a pull request:

https://github.com/apache/spark/pull/3821

Fix: 'Create table ..as select ..from..order by .. limit 10' report error 
when one col is a Decimal



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guowei2/spark SPARK-4988

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3821.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3821


commit 1bab9e4b782e62485f01f4f650a54c5ccb86f2a1
Author: guowei2 guow...@asiainfo.com
Date:   2014-12-29T07:57:51Z

Fix: 'Create table ..as select ..from..order by .. limit 10' report error 
when one col is a Decimal




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix: 'Create table ..as select ..from..order b...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3821#issuecomment-68238731
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...

2014-12-29 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3822

[SPARK-4985] [SQL] parquet support for date type

This PR might have some issues with #3732 ,
and this would have merge conflicts with #3820 so the review can be delayed 
till that 2 were merged.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark parquetdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3822.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3822


commit 0ebe356bceff169fe89134bed603a17514dc1108
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2014-12-29T07:59:37Z

parquet support for date type




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3822#issuecomment-68238990
  
  [Test build #24856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24856/consoleFull)
 for   PR 3822 at commit 
[`0ebe356`](https://github.com/apache/spark/commit/0ebe356bceff169fe89134bed603a17514dc1108).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68239441
  
  [Test build #24854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24854/consoleFull)
 for   PR 3819 at commit 
[`8fe74b0`](https://github.com/apache/spark/commit/8fe74b03e63e36d370ba61946181194b1f0c84a2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68239445
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24854/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990]to find default properties file, s...

2014-12-29 Thread WangTaoTheTonic

GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/spark/pull/3823

[SPARK-4990]to find default properties file, search SPARK_CONF_DIR first

https://issues.apache.org/jira/browse/SPARK-4990

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/spark SPARK-4990

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3823


commit c5a85eb37389f3c849129267fcef0dfa608d09c6
Author: WangTaoTheTonic barneystin...@aliyun.com
Date:   2014-12-29T08:17:32Z

to find default properties file, search SPARK_CONF_DIR first




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-68239625
  
  [Test build #24857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24857/consoleFull)
 for   PR 3823 at commit 
[`c5a85eb`](https://github.com/apache/spark/commit/c5a85eb37389f3c849129267fcef0dfa608d09c6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/3824

[SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in 
standalone mode

when enabling eventlog in standalone mode, if give the wrong configuration, 
the standalone cluster will down (cause master restart, lose connection with 
workers).
How to reproduce: just give an invalid value to spark.eventLog.dir, for 
example: spark.eventLog.dir=hdfs://tmp/logdir1, hdfs://tmp/logdir2. This will 
throw illegalArgumentException, which will cause the Master restart. And the 
whole cluster is not available.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark wrongConf4Cluster

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3824


commit 5c1fa33799bc503ac1e2d5e9838e8e364bf1f61f
Author: Zhang, Liye liye.zh...@intel.com
Date:   2014-12-26T08:23:53Z

cache exceptions when eventlog with wrong conf

commit 12eee8590fb9899c267b29d3a129a169b6cf6ec1
Author: Zhang, Liye liye.zh...@intel.com
Date:   2014-12-26T08:49:04Z

add more message in log and on webUI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3824#issuecomment-68239849
  
  [Test build #24858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24858/consoleFull)
 for   PR 3824 at commit 
[`12eee85`](https://github.com/apache/spark/commit/12eee8590fb9899c267b29d3a129a169b6cf6ec1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2014-12-29 Thread bgreeven

Github user bgreeven commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-68241121
  
I have compared the ANN with Support Vector Machine (SVM) and Logistic 
Regression.

I have tested using a master local(5) configuration, and applied the 
MNIST dataset, using 6 training examples and 1 test examples.

Since SVM and Logistic Regression are binary classifiers, I applied two 
methods to convert them to a multinary classifier: majority vote and ad-hoc 
tree.

For the majority vote, I trained 10 different models, each to distinguish a 
single class from the rest. The classification was done by looking at which 
model gives the highest positive output. I performed 100 iterations per class, 
leading to 1000 iterations in total.

For ANN, I used a single hidden layer with 32 nodes (not counting the bias 
nodes). I performed 100 iterations.

For LBFGS I used tolerance 1e-5.

Because of the poor performance of SVM+SGD, I re-ran it with 1000 
iterations per class (1 in total). The performance was similar.

I found the following results for the test set:

```
  Algorithm Accuracy   Time# correct   # 
incorrect

+-+--+---+---+-+
| ANN (LBFGS) |95.1% |  665s |  9510 | 
490 |

+-+--+---+---+-+
| Logistic Regression (SGD)   |72.0% | 1325s |  7202 |
2798 |

+-+--+---+---+-+
| Logistic Regression (LBFGS) |86.6% | 1635s |  8658 |
1342 |

+-+--+---+---+-+
| SVM (SGD)   |18.6% | 1294s |  1860 |
8140 |

+-+--+---+---+-+
| (SVM (SGD) 1000 iterations) |18.5% |12658s |  1850 |
8150 |

+-+--+---+---+-+
| SVM (LBFGS) |86.2% | 1453s |  8622 |
1378 |

+-+--+---+---+-+
```

I also created an ad-hoc tree model. This separates the collection of 
training examples in two approximately equal size partitions, where I tried to 
separate the numbers based on how different they look. I continued with the two 
separated partitions, until each output class corresponded to a single number.

The partioning choice was made manually and intuitively, as follows:

0123456789 - (04689, 12357)
04689 - (068, 49)
068 - (0, 68)
68 - (6, 8)
49 - (4, 9)
12357 - (17, 235)
17 - (1, 7)
235 - (2, 35)
35 - (3, 5)

Notice that this leads to only nine classification runs, not ten as in the 
voting scheme.

After training, I used the trained models to classify the test set. I got 
the following results (same parameters as with the voting scheme):

```
  Algorithm Accuracy   Time# correct   # 
incorrect

+-+--+---+---+-+
| ANN (LBFGS) |95.1% |  665s |  9510 | 
490 |

+-+--+---+---+-+
| Logistic Regression (SGD)   |82.3% | 1146s |  8228 |
1772 |

+-+--+---+---+-+
| Logistic Regression (LBFGS) |87.2% | 1273s |  8719 |
1281 |

+-+--+---+---+-+
| SVM (SGD)   |61.1% | 1148s |  6113 |
3887 |

+-+--+---+---+-+
| SVM (LBFGS) |87.5% | 1182s |  8753 |
1247 |

+-+--+---+---+-+
```

Notice that I left ANN in the table because this is to compare ANN with 
other algorithms. Since ANN is a multinary classifier by nature, it didn't use 
the ad-hoc tree.

It would be great if someone could verify of my results. I am particularly 
amazed of the low performance of SVM+SGD with voting, and the increase with the 
ad-hoc tree. I used the same code for SGD and LBFGS, and only changed the 
optimiser and related parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at

[GitHub] spark pull request: Added Java serialization util functions back i...

2014-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3792#discussion_r22305806
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/util/JavaUtils.java ---
@@ -41,6 +41,34 @@
 public class JavaUtils {
   private static final Logger logger = 
LoggerFactory.getLogger(JavaUtils.class);
 
+  /** Deserialize a byte array using Java serialization. */
+  public static T T deserialize(byte[] bytes) {
+try {
+  ObjectInputStream is = new ObjectInputStream(new 
ByteArrayInputStream(bytes));
+  Object out = is.readObject();
+  is.close();
+  return (T) out;
+} catch (ClassNotFoundException e) {
+  throw new RuntimeException(Could not deserialize object, e);
--- End diff --

I was thinking that you don't expect to not have the class on hand... But 
sure, IllegalArgumentException because the bytes describe something invalid? 
The principle is to avoid RuntimeException since it is the superclass of all 
unchecked exceptions. If you ever wanted to catch this exception to deal with 
it you'd have no hope of distinguishing with a catch block. So reach for 
another standard and slightly more specific exception. Marginal argument here, 
but I think still common good practice in Java. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/3825

[SPARK-4991][CORE] Worker should reconnect to Master when Master actor 
restart

This is a following JIRA of 
[SPARK-4989](https://issues.apache.org/jira/browse/SPARK-4991). when Master 
akka actor encounter an exception, the Master will restart (akka actor restart 
not JVM restart). And all old information are cleared on Master (including 
workers, applications, etc). However, the workers are not aware of this at all. 
The state of the cluster is that: the master is on, and all workers are also 
on, but master is not aware of the exists of workers, and will ignore all 
worker's heartbeat because all workers are not registered. So that the whole 
cluster is not available.

In this PR, master will tell worker the connection is disconnected, so that 
worker will register to master again. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark workerReconn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3825


commit 107e5c58fdbe143fe6eabcfdb5d91d7b1184bb35
Author: Zhang, Liye liye.zh...@intel.com
Date:   2014-12-29T07:35:45Z

worker reconnect to master when master restart for exception

commit e9c99e3969f6e058e46d65575d796d1289351318
Author: Zhang, Liye liye.zh...@intel.com
Date:   2014-12-29T08:51:50Z

add log info




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68241590
  
  [Test build #24859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24859/consoleFull)
 for   PR 3825 at commit 
[`e9c99e3`](https://github.com/apache/spark/commit/e9c99e3969f6e058e46d65575d796d1289351318).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68241728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24855/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68241724
  
  [Test build #24855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24855/consoleFull)
 for   PR 3820 at commit 
[`d44831a`](https://github.com/apache/spark/commit/d44831a2462b2c049b0222fbb7b8e08023d1f67c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...

2014-12-29 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3816#issuecomment-68241811
  
This seems like a fairly simple fix, but given that I don't 100% understand 
the discussion on SPARK-2294 / #1313, it might be good for @codingcat, 
@kayousterhout @mridulm, or @mateiz to take a look. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] enable view test

2014-12-29 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3826

[SQL] enable view test

This is a follow up of #3396 , just add a test to white list.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark viewtest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3826.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3826


commit f105f68ef33381e272985866fb63a7e7775b76bb
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2014-12-29T09:04:24Z

enable view test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3824#discussion_r22306027
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -758,13 +760,14 @@ private[spark] class Master(
 // Event logging is enabled for this application, but no event 
logs are found
 val title = sApplication history not found (${app.id})
 var msg = sNo event logs found for application $appName in 
$eventLogFile.
-logWarning(msg)
+val exception = URLEncoder.encode(Utils.exceptionString(fnf), 
UTF-8)
+logWarning(msg, fnf)
 msg +=  Did you specify the correct logging directory?
 msg = URLEncoder.encode(msg, UTF-8)
-app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
+app.desc.appUiUrl = notFoundBasePath + 
s?msg=$msgexception=$exceptiontitle=$title
 false
   case e: Exception =
-// Relay exception message to application UI page
+// Replay exception message to application UI page
--- End diff --

The word `Relay` was correct here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3824#discussion_r22306063
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -758,13 +760,14 @@ private[spark] class Master(
 // Event logging is enabled for this application, but no event 
logs are found
 val title = sApplication history not found (${app.id})
 var msg = sNo event logs found for application $appName in 
$eventLogFile.
-logWarning(msg)
+val exception = URLEncoder.encode(Utils.exceptionString(fnf), 
UTF-8)
+logWarning(msg, fnf)
 msg +=  Did you specify the correct logging directory?
 msg = URLEncoder.encode(msg, UTF-8)
-app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
+app.desc.appUiUrl = notFoundBasePath + 
s?msg=$msgexception=$exceptiontitle=$title
--- End diff --

This will likely be too long in general to put in a URL. Did you add this 
URL param elsewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] enable view test

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3826#issuecomment-68242119
  
  [Test build #24860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24860/consoleFull)
 for   PR 3826 at commit 
[`f105f68`](https://github.com/apache/spark/commit/f105f68ef33381e272985866fb63a7e7775b76bb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread liyezhang556520

Github user liyezhang556520 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3824#discussion_r22306099
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -758,13 +760,14 @@ private[spark] class Master(
 // Event logging is enabled for this application, but no event 
logs are found
 val title = sApplication history not found (${app.id})
 var msg = sNo event logs found for application $appName in 
$eventLogFile.
-logWarning(msg)
+val exception = URLEncoder.encode(Utils.exceptionString(fnf), 
UTF-8)
+logWarning(msg, fnf)
 msg +=  Did you specify the correct logging directory?
 msg = URLEncoder.encode(msg, UTF-8)
-app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
+app.desc.appUiUrl = notFoundBasePath + 
s?msg=$msgexception=$exceptiontitle=$title
 false
   case e: Exception =
-// Relay exception message to application UI page
+// Replay exception message to application UI page
--- End diff --

Yes, you are write, relay is correct, replay is not correct, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3823#discussion_r22306164
  
--- Diff: bin/spark-submit ---
@@ -42,7 +42,10 @@ while (($#)); do
   shift
 done
 
-DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
--- End diff --

`SparkSubmitArguments` already ultimately handles this case, right? What 
does this fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread liyezhang556520

Github user liyezhang556520 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3824#discussion_r22306172
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -758,13 +760,14 @@ private[spark] class Master(
 // Event logging is enabled for this application, but no event 
logs are found
 val title = sApplication history not found (${app.id})
 var msg = sNo event logs found for application $appName in 
$eventLogFile.
-logWarning(msg)
+val exception = URLEncoder.encode(Utils.exceptionString(fnf), 
UTF-8)
+logWarning(msg, fnf)
 msg +=  Did you specify the correct logging directory?
 msg = URLEncoder.encode(msg, UTF-8)
-app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
+app.desc.appUiUrl = notFoundBasePath + 
s?msg=$msgexception=$exceptiontitle=$title
--- End diff --

No


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3822#issuecomment-68242437
  
  [Test build #24856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24856/consoleFull)
 for   PR 3822 at commit 
[`0ebe356`](https://github.com/apache/spark/commit/0ebe356bceff169fe89134bed603a17514dc1108).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4985] [SQL] parquet support for date ty...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3822#issuecomment-68242443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24856/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68242474
  
This can be turned into a simple mouseover with much less work, no CSS or 
Javascript. Just display the shortened version, and make the long description 
the `title` attribute of an enclosing tag like `span` or `div`. I think that 
might even be more intuitive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...

2014-12-29 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3818#issuecomment-68243274
  
Yes looks like a simple copy-and-paste error. Doesn't even really need a 
JIRA as there's nothing more to this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/spark/pull/3823#discussion_r22306481
  
--- Diff: bin/spark-submit ---
@@ -42,7 +42,10 @@ while (($#)); do
   shift
 done
 
-DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
--- End diff --

Used for 

if [[ $SPARK_SUBMIT_DEPLOY_MODE == client  -f 
$SPARK_SUBMIT_PROPERTIES_FILE ]]; then
  # Parse the properties file only if the special configs exist
  contains_special_configs=$(
grep -e spark.driver.extra*\|spark.driver.memory 
$SPARK_SUBMIT_PROPERTIES_FILE | \
grep -v ^[[:space:]]*#
  )
  if [ -n $contains_special_configs ]; then
export SPARK_SUBMIT_BOOTSTRAP_DRIVER=1
  fi
fi




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68243522
  
Hi @srowen, I am not a expert of css/javascript, my question here if do 
like your suggest(mouseover), can we copy the full sql statement in ui?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3824#issuecomment-68243544
  
  [Test build #24861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24861/consoleFull)
 for   PR 3824 at commit 
[`a49c52f`](https://github.com/apache/spark/commit/a49c52fc995c7ac110d0ab07a4da2f87cf74de2d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-68243880
  
  [Test build #24862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24862/consoleFull)
 for   PR 3732 at commit 
[`3b4d5d8`](https://github.com/apache/spark/commit/3b4d5d80dc716a9fe2782115399a77f171d66cc7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68243955
  
A mouseover can have whatever you like, whatever you can put in a `title` 
attribute. The browser will lay it out. You could try and see if it's an 
effective view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: HiveTableScan return mutable row with copy

2014-12-29 Thread yanbohappy

GitHub user yanbohappy opened a pull request:

https://github.com/apache/spark/pull/3827

HiveTableScan return mutable row with copy

https://issues.apache.org/jira/browse/SPARK-4963
SchemaRDD.sample() return wrong results due to GapSamplingIterator 
operating on mutable row.
HiveTableScan make RDD with SpecificMutableRow and SchemaRDD.sample() will 
return GapSamplingIterator for iterating. 

override def next(): T = {
val r = data.next()
advance
r
  }

GapSamplingIterator.next() return the current underlying element and 
assigned it to r.
However if the underlying iterator is mutable row just like what 
HiveTableScan returned, underlying iterator and r will point to the same object.
After advance operation, we drop some underlying elments and it also 
changed r which is not expected. Then we return the wrong value different from 
initial r.

To fix this issue, the most direct way is to make HiveTableScan return 
mutable row with copy just like the initial commit that I have made. This 
solution will make HiveTableScan can not get the full advantage of reusable 
MutableRow, but it can make sample operation return correct result.
Further more, we need to investigate  GapSamplingIterator.next() and make 
it can implement copy operation inside it. To achieve this, we should define 
every elements that RDD can store implement the function like cloneable and it 
will make huge change.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanbohappy/spark spark-4963

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3827.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3827


commit 6eaee5e7b1b5aca7f6abd16892f8312c7d6d7917
Author: Yanbo Liang yanboha...@gmail.com
Date:   2014-12-29T09:00:44Z

HiveTableScan return mutable row with copy




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68244071
  
thanks, i will try with it:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3823#discussion_r22306764
  
--- Diff: bin/spark-submit ---
@@ -42,7 +42,10 @@ while (($#)); do
   shift
 done
 
-DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
+if [ ! -f DEFAULT_PROPERTIES_FILE ]; then
+  DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+fi
 export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client}
 export 
SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE}
--- End diff --

It's used here actually, but the purpose below seems to be detecting 
whether the user has set particular properties. Finding the default config 
doesn't matter here since it is a case where the user hasn't set these 
properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-68244188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24857/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-68244186
  
  [Test build #24857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24857/consoleFull)
 for   PR 3823 at commit 
[`c5a85eb`](https://github.com/apache/spark/commit/c5a85eb37389f3c849129267fcef0dfa608d09c6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4963 [SQL] HiveTableScan return mutable ...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3827#issuecomment-68244170
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3824#issuecomment-68244528
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24858/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3824#issuecomment-68244526
  
  [Test build #24858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24858/consoleFull)
 for   PR 3824 at commit 
[`12eee85`](https://github.com/apache/spark/commit/12eee8590fb9899c267b29d3a129a169b6cf6ec1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-68246138
  
  [Test build #24862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24862/consoleFull)
 for   PR 3732 at commit 
[`3b4d5d8`](https://github.com/apache/spark/commit/3b4d5d80dc716a9fe2782115399a77f171d66cc7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `final class Date extends Ordered[Date] with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-68246141
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24862/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] enable view test

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3826#issuecomment-68246366
  
  [Test build #24860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24860/consoleFull)
 for   PR 3826 at commit 
[`f105f68`](https://github.com/apache/spark/commit/f105f68ef33381e272985866fb63a7e7775b76bb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] enable view test

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3826#issuecomment-68246371
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24860/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68246470
  
  [Test build #24859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24859/consoleFull)
 for   PR 3825 at commit 
[`e9c99e3`](https://github.com/apache/spark/commit/e9c99e3969f6e058e46d65575d796d1289351318).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class MasterDisconnected(masterUrl: String) extends 
DeployMessage`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68246474
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24859/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/spark/pull/3823#discussion_r22307986
  
--- Diff: bin/spark-submit ---
@@ -42,7 +42,10 @@ while (($#)); do
   shift
 done
 
-DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
+if [ ! -f DEFAULT_PROPERTIES_FILE ]; then
+  DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+fi
 export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client}
 export 
SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE}
--- End diff --

If user didn't pass `--properties-file` and define SPARK_CONF_DIR while not 
SPARK_HOME, then spark-submit will see if `spark.driver.extra*` in directory 
specified by `DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf`. 
Obviously it will make the wrong judgement when SPARK_CONF_DIR does not equal 
`$SPARK_HOME/conf`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3823#discussion_r22308280
  
--- Diff: bin/spark-submit ---
@@ -42,7 +42,10 @@ while (($#)); do
   shift
 done
 
-DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
+if [ ! -f DEFAULT_PROPERTIES_FILE ]; then
+  DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+fi
 export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client}
 export 
SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE}
--- End diff --

OK I see the scenario now where this matters. So the default config for an 
installation might in fact set these special properties, which the script needs 
to handle before `SparkSubmit` starts. Maybe someone else can double-check, but 
that makes sense to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68248321
  
hi @srowen, i tried like this 
```
div title=full sql: shortened sql/div
```
but i can not copy the full sql, or i missed something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/spark/pull/3823#discussion_r22308521
  
--- Diff: bin/spark-submit ---
@@ -42,7 +42,10 @@ while (($#)); do
   shift
 done
 
-DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
+if [ ! -f DEFAULT_PROPERTIES_FILE ]; then
+  DEFAULT_PROPERTIES_FILE=$SPARK_HOME/conf/spark-defaults.conf
+fi
 export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-client}
 export 
SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-$DEFAULT_PROPERTIES_FILE}
--- End diff --

I mean there is a possibility that user don't use the conf sub-directory 
default installation location but specified.
For instance, I touch a spark-defaults.conf under `/etc/my-spark/` and use 
it for submitting applications, so I set SPARK_CONF_DIR to `/etc/my-spark/` to 
make the properties file work.
The properties file under `$SPARK_HOME/conf` could be unused or used for 
submitting other applications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68248476
  
@scwf You should be able to put whatever you want in there as the `title`. 
What's the issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3824#issuecomment-68248630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24861/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3824#issuecomment-68248623
  
  [Test build #24861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24861/consoleFull)
 for   PR 3824 at commit 
[`a49c52f`](https://github.com/apache/spark/commit/a49c52fc995c7ac110d0ab07a4da2f87cf74de2d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68249501
  
Sorry, did not get you. Do you mean i put others here( to implement an 
attribute similar with ```title```,) instead of ```title```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68250243
  
I'm suggesting something like replacing

```
divem{lastStageDescription}/em/div
```

with

```
div 
title={lastStageDescription}em{shortLastStageDescription}/em/div
```

... when the description is long. This should cause the short description 
to pop up the full description in a mouseover. Maybe I'm missing something as 
to why that won't work, but it is a lot simpler at least.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...

2014-12-29 Thread lianhuiwang

GitHub user lianhuiwang opened a pull request:

https://github.com/apache/spark/pull/3828

[SPARK-4994][network]Cleanup removed executors' ShuffleInfo  in yarn 
shuffle service

when the application is completed, yarn's nodemanager can remove 
application's local-dirs.but all executors' metadata of completed application 
havenot be removed. now it let yarn ShuffleService to have much more memory to 
store Executors' ShuffleInfo. so these metadata need to be removed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lianhuiwang/spark SPARK-4994

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3828.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3828


commit f3ba1d283834b3583da829306a475781fb12ecb9
Author: lianhuiwang lianhuiwan...@gmail.com
Date:   2014-12-29T12:34:38Z

Cleanup removed executors' ShuffleInfo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3828#issuecomment-68254595
  
  [Test build #24863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24863/consoleFull)
 for   PR 3828 at commit 
[`f3ba1d2`](https://github.com/apache/spark/commit/f3ba1d283834b3583da829306a475781fb12ecb9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4984][CORE][UI] Adding a pop-up contain...

2014-12-29 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-68254852
  
Hi @srowen, this and ```div title=full sql: shortened sql/div``` 
both work for me. Why i do not use this solution is that we maybe want to 
copy(ctrl + c) the full description from UI sometimes, if using ```title``` we 
can not copy the full description:)

But this is really much simpler and if you think it's very low probability 
for user to copy the full desc, i will change it to this simple solution:)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-68256422
  
Hi, @WangTaoTheTonic : )
This make sense to me, like hadoop also have HADOOP_CONF_DIR. 

But I prefer to `check` if the `SPARK_CONF_DIR` `directory` is exists 
first, not only check the `configuration file`.
If there are many files under SPARK_CONF_DIR need to be added in 
spark-submit in the future, you will need to check each file is exists or not. 

you can do it like :
```shell
if [ ! -d $SPARK_CONF_DIR ]; then
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-68257882
  
@OopsOutOfMemory Thanks for your comment and I understand your concern.
Actually it doesn't matter that if SPARK_CONF_DIR does not exist here 
because we can use `$SPARK_HOME/conf` instead. And checking logic of properties 
file contains that of SPARK_CONF_DIR.
In other places of spark codes, we usually use a `getOrElse` logic to 
handle configuration.
It is easy to analyse when use got some specific config wrong and we'd 
better not to broke this tradition. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [YARN][SPARK-4929] Bug fix: fix the yarn-clien...

2014-12-29 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3771#issuecomment-68260017
  
Can you please be a bit more specific and detail out exact what happens 
here? Are you referring to when RM has to failover or during rolling upgrade. 
Is the container brought down and then back up again... please just describe 
the scenario and what exactly is happening. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3828#issuecomment-68260130
  
  [Test build #24863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24863/consoleFull)
 for   PR 3828 at commit 
[`f3ba1d2`](https://github.com/apache/spark/commit/f3ba1d283834b3583da829306a475781fb12ecb9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3828#issuecomment-68260135
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...

2014-12-29 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3797#issuecomment-68260624
  
Looks good. +1. Thanks @lianhuiwang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...

2014-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3797


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2014-12-29 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-68261526
  
Sorry, maybe here is a misunderstanding.
What I mean is to __`change` the `checking logic of properties file`__ 
instead of __`checking whether the SPARK_CONF_DIR   is `user-specifc` or 
`default` . But not to add an extra checking directory here .
Let me raise an example:
```
if [ ! -d $SPARK_CONF_DIR ]; then
   export SPARK_CONF_DIR = $SPARK_HOME/conf
fi
DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
XXX_PROPERTIES_FILE = $SPARK_CONF_DIR/xxx.conf
```
To check conf directory is more reasonable, because the key point is the  
`SPARK_CONF_DIR `.  The original concern here is to change the `path` of 
`SPARK_CONF_DIR` but not only `spark-default.conf`.


 analyse when use got some specific config wrong

We may add an extra warning for the `key configuration file` here. i.e If  
`spark-deault.conf` is missing or changed to a user-specific dir under 
conf_dir, we may raise an `warning log` to let user aware of this before 
submitting.

Currently, I think both of the two solutions is ok! : ) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-29 Thread zapletal-martin

Github user zapletal-martin commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-68279092
  
1) Can you please clarify if you are sugesting to use RDD[(Double, Double, 
Double)] - i.e label, feature, weight or RDD[(Double, Double)] - i.e. just 
label, weight and already expect the data to be ordered? Also I assume there 
should be API with weight default to 1 (so user does not have to specify it).

2) IsotonicRegressionModel extends RegressionModel. It implements methods 
predict(testData: RDD[Vector]) and predict(testData: Vector). Are these still 
relevant if we implement the changes in 1)? There would never be a Vector, just 
Double. Also we would need feature in 1) to be able to predict label.

3) How do you expect the java api to look like? Unfortunately the 
java/scala interop here is not very helpful. When train method expects tuple of 
scala.Double then when called from java you get:

[error] IsotonicRegressionModel model = 
IsotonicRegression.train(testRDD.rdd(), true);
[error]   ^
[error]   required: RDDTuple3Object,Object,Object,boolean
[error]   found: RDDTuple3Double,Double,Double,boolean
[error]   reason: actual argument RDDTuple3Double,Double,Double cannot 
be converted to RDDTuple3Object,Object,Object by method invocation 
conversion

There are solutions to this problem, but most of them quite ugly. See for 
example 
http://stackoverflow.com/questions/17071061/scala-java-interoperability-how-to-deal-with-options-containing-int-long-primi
 or http://www.scala-notes.org/2011/04/specializing-for-primitive-types/.

Is there another public java api that uses primitive type in generic that I 
could use as reference?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

2014-12-29 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3791#issuecomment-68279740
  
@hzlyx As @srowen mentioned, this is a contract to avoid the expensive 
check. You can use 


https://github.com/hzlyx/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala#L191

if the indices are not ordered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68279848
  
@ganonp Could you update the branch and remove the last commit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3955 part 2 [CORE] [HOTFIX] Different ve...

2014-12-29 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/3829

SPARK-3955 part 2 [CORE] [HOTFIX] Different versions between 
jackson-mapper-asl and jackson-core-asl

@pwendell 
https://github.com/apache/spark/commit/2483c1efb6429a7d8a20c96d18ce2fec93a1aff9 
didn't actually add a reference to `jackson-core-asl` as intended, but a second 
redundant reference to `jackson-mapper-asl`, as @markhamstra picked up on 
(https://github.com/apache/spark/pull/3716#issuecomment-68180192)  This just 
rectifies the typo. I missed it as well; the original PR 
https://github.com/apache/spark/pull/2818 had it correct and I also didn't see 
the problem.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-3955

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3829.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3829


commit 6cfdc4e3dfe0a04e32a955bedffd5747fad9d70c
Author: Sean Owen so...@cloudera.com
Date:   2014-12-29T18:13:29Z

Actually refer to jackson-core-asl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3955 part 2 [CORE] [HOTFIX] Different ve...

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3829#issuecomment-68282805
  
  [Test build #24864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24864/consoleFull)
 for   PR 3829 at commit 
[`6cfdc4e`](https://github.com/apache/spark/commit/6cfdc4e3dfe0a04e32a955bedffd5747fad9d70c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...

2014-12-29 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/3816#issuecomment-68282936
  
Why was this a problem? You need to make sure that this won't change the 
locality level the scheduler launches tasks at due to delay scheduling. For 
example, if a stage contained both process-local and no-pref tasks, and it was 
still able to launch tasks locally (without the delay expiring), this change 
might make it forget that and not wait long enough, thus not getting local 
tasks. Please write down something explaining why this was a problem and why 
the fix won't break other things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68283433
  
@witgo commented on the actual commit: 
https://github.com/apache/spark/commit/a3e51cc990812c8099dcaf1f3bd6d5bae45cf8e6#commitcomment-9101060

 It seems that every time you run ./build/mvn had to re-download 
scala-2.10.4.tgz

Can you investigate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3955 part 2 [CORE] [HOTFIX] Different ve...

2014-12-29 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3829#issuecomment-68283989
  
Gotcha - thanks sean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...

2014-12-29 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3816#issuecomment-68284050
  
@mateiz the JIRA claims that this results in extra unnecessary locality 
delay.  I thought that the problem might have been an obvious typo, but it 
sounds like you're saying this may have been the intended behavior.   I'll look 
deeper into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3809#discussion_r22322821
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -329,8 +329,11 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   try {
 dagScheduler = new DAGScheduler(this)
   } catch {
-case e: Exception = throw
-  new SparkException(DAGScheduler cannot be initialized due to 
%s.format(e.getMessage))
+case e: Exception = {
+  stop()
--- End diff --

Also, do you think this should be in a try-finally block so that we don't 
swallow the useful DAGScheduler could not be initialized exception if the 
stop() call somehow fails?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68284859
  
Wouldn't it be better to ensure that actors like Master and DAGScheduler 
never die due to uncaught exceptions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3818#issuecomment-68284924
  
LGTM, thanks.  I agree with Sean that this doesn't need a JIRA issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4982][DOC]The `spark.ui.retainedJobs` m...

2014-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Adde LICENSE Header to build/mvn, build/sbt an...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3817#issuecomment-68285274
  
LGTM, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68285349
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Adde LICENSE Header to build/mvn, build/sbt an...

2014-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3817


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added setMinCount to Word2Vec.scala

2014-12-29 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3693#issuecomment-68285678
  
  [Test build #24865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24865/consoleFull)
 for   PR 3693 at commit 
[`ad534f2`](https://github.com/apache/spark/commit/ad534f26c44a7bdc8ee91f73d80a93bd13aa6805).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

2014-12-29 Thread brennonyork

Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68285768
  
@JoshRosen I can't reproduce this error and, after looking through the 
code, I'm not seeing where an issue like that could crop up :/ @witgo could you 
help me understand when you're seeing this and provide me the output of `bash 
-x ./build/mvn clean`? With that I can much better understand how to fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3824#discussion_r22323425
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -758,13 +760,14 @@ private[spark] class Master(
 // Event logging is enabled for this application, but no event 
logs are found
 val title = sApplication history not found (${app.id})
 var msg = sNo event logs found for application $appName in 
$eventLogFile.
-logWarning(msg)
+val exception = URLEncoder.encode(Utils.exceptionString(fnf), 
UTF-8)
+logWarning(msg, fnf)
 msg +=  Did you specify the correct logging directory?
 msg = URLEncoder.encode(msg, UTF-8)
-app.desc.appUiUrl = notFoundBasePath + s?msg=$msgtitle=$title
+app.desc.appUiUrl = notFoundBasePath + 
s?msg=$msgexception=$exceptiontitle=$title
--- End diff --

@srowen It looks like this same `exception` URL param is used in other 
exception-handling code in this same file (the first instance was added by 
@andrewor14 in 6afca2d1079bac6309a595b8e0ffc74ae93fa662).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...

2014-12-29 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/3816#issuecomment-68285949
  
Well, what I'm saying is to look at how it affects the rest of the 
scheduler. That was set to PROCESS_LOCAL there for a reason, it wasn't a typo. 
It was to make sure that launching a no-pref task doesn't then cause you to 
increase your allowed locality level and miss waiting for other local ones. I'd 
also like to see what performance different this makes in the original case, 
and why it was a problem there (e.g. was this an InputFormat with no locality 
info at all or something).

One fix by the way may be to not count NO_PREF launches at all when 
deciding how to update delay scheduling variables, but even then it's good to 
understand what this was doing and make sure it won't break it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-29 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68286086
  
@srowen Sorry for the delay!  I'm really starting to wonder about this 
JIRA, though.  The collect() should return one BinaryLabelCounter per 
partition.  I'd assume people would have enough memory to store at least a few 
million BinaryLabelCounter instances on the driver.  Does that mean they have 
more than a few million partitions?

Sorry I didn't think about this earlier, and perhaps I'm just confusing 
myself now---let me know what you think.  Is there an issue to solve here?

Previously, I'd have said: With the update, this LGTM

Also, I did think of one use case which may change things: We've been 
talking about people using these methods to make plots.  Do you think people 
ever use them to choose thresholds?  If so, then people might want much 
finer-grained ROC curves than we've been thinking, and it might be worthwhile 
to do a fancy implementation which avoids binning.

At any rate, apologies for so much back-and-forth.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3824#discussion_r22323517
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -719,26 +719,28 @@ private[spark] class Master(
   def rebuildSparkUI(app: ApplicationInfo): Boolean = {
 val appName = app.desc.name
 val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + /not-found
-val eventLogFile = app.desc.eventLogDir
-  .map { dir = EventLoggingListener.getLogPath(dir, app.id) }
-  .getOrElse {
-// Event logging is not enabled for this application
-app.desc.appUiUrl = notFoundBasePath
-return false
-}
-val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf)
+var eventLogFile: String = null
--- End diff --

It looks like `eventLogFile` is only read from inside the `try` block on 
the following line, so why not move it inside and make it into a `val` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68286943
  
More specifically, I guess I'm suggesting that we modify wrap the `receive` 
and `receiveWithLogging` methods of our actors with try-catch blocks to log any 
exceptions that bubble up to the top of the actors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2014-12-29 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-68286932
  
2a) `(label: Double, feature: Double, weight: Double)` sounds good to me. 
We may add weight support to `LabeledPoint` as part of SPARK-3702, which should 
be orthogonal to this PR. We can update the API here (before 1.3) once that 
gets merged.

2b) Isotonic regression is a univariate regression algorithm. It is not 
necessary to have its model extend RegressionModel. It should have 
`predict(RDD[Double])` and `predict(Double)`.

2c) Try `train(JavaPairRDDjava.lang.Double, java.lang.Double)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4989][CORE] avoid wrong eventlog conf c...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3824#issuecomment-68287342
  
This change seems okay to me overall, aside from one minor nit.  Most of 
the change is just broadening the scope of the `try` block to handle some cases 
that didn't seem like they could fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4893] Clean up uses of System.setProper...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3739#issuecomment-68287961
  
/cc @pwendell @andrewor14 @tdas Could one of you review this?  It's 
blocking a couple of other PRs that I'd like to merge.  This looks like a lot 
of changes, but they're isolated to test code and most cases are small, local 
changes to replace system property usage with SparkConf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4968: takeOrdered to skip reduce step in...

2014-12-29 Thread saucam

GitHub user saucam opened a pull request:

https://github.com/apache/spark/pull/3830

SPARK-4968: takeOrdered to skip reduce step in case mappers return no 
partitions

takeOrdered should skip reduce step in case mapped RDDs have no partitions. 
This prevents the mentioned exception : 

4. run query
SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC 
LIMIT 100;
Error trace
java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/saucam/spark fix_takeorder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3830.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3830


commit 5974d10c619dac2ca2433d331e43ed48e6822f90
Author: Yash Datta yash.da...@guavus.com
Date:   2014-12-29T19:06:32Z

SPARK-4968: takeOrdered to skip reduce step in case mappers return no 
partitions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4968: takeOrdered to skip reduce step in...

2014-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3830#issuecomment-68292137
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4946] [CORE] Using AkkaUtils.askWithRep...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3785#issuecomment-68292416
  
On the surface, this seems like an okay change.  I wonder whether this 
retry logic could have unexpected consequences.  Let me try to reason it out:

- `askTracker` is only called with `GetMapOutputStatuses`.
- In the master actor, it calls `getSerializedMapOutputStatuses`.  This 
method never throws exceptions: if a shuffle is missing, then it just stores an 
empty array and serializes it.
- It's possible that the serialized map statuses could exceed the Akka 
frame size (although extremely unlikely and perhaps impossible with the new 
output status compression techniques).  In this case, though, the master would 
throw an exception and fail to send a reply back to the asker.  In this case, 
with this patch we'd end up performing a bunch of retries for an operation that 
will ultimately fail, so we'll take longer to detect a failure.

In the common cases, though, this seems fine, even if the map output 
statuses are missing (since it won't introduce a bunch of futile retries).  
Therefore, I think we should pull this in; I don't know if this fixes an actual 
bug, but it seems like it could make things more robust.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4946] [CORE] Using AkkaUtils.askWithRep...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3785#issuecomment-68293600
  
Alright, I'm going to merge this into `master` (1.3.0).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4946] [CORE] Using AkkaUtils.askWithRep...

2014-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3785


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-29 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68294158
  
To reformat the PR description to make it a little easier to read:

 HadoopRDD.getPartitions is lazyied to process in 
DAGScheduler.JobSubmitted. If inputdir is large, getPartitions may spend much 
time.  For example, in our cluster, it needs from 0.029s to 766.699s. If one 
JobSubmitted event is processing, others should wait. Thus, we want to put 
HadoopRDD.getPartitions forward to reduce DAGScheduler.JobSubmitted processing 
time. Then other JobSubmitted event don't need to wait much time. HadoopRDD 
object could get its partitons when it is instantiated.
 
 We could analyse and compare the execution time before and after 
optimization.
 ```
 TaskScheduler.start execution time: [time1__]
 DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or 
TaskScheduler.start) execution time: [time2_]
 HadoopRDD.getPartitions execution time: [time3___]
 Stages execution time: [time4_].
 ```
 (1) The app has only one job
 (a)
 ```
 The execution time of the job before optimization is 
[time1__][time2_][time3___][time4_].
 The execution time of the job after optimization 
is[time1__][time3___][time2_][time4_].
 ```
 In summary, if the app has only one job, the total execution time is same 
before and after optimization.
 (2) The app has 4 jobs
 (a) Before optimization,
 ```
 job1 execution time is [time2_][time3___][time4_],
 job2 execution time is [time2__][time3___][time4_],
 job3 execution time 
is[time2][time3___][time4_],
 job4 execution time 
is[time2_][time3___][time4_].
 ```
 After optimization, 
 ```
 job1 execution time is [time3___][time2_][time4_],
 job2 execution time is [time3___][time2__][time4_],
 job3 execution time 
is[time3___][time2_][time4_],
 job4 execution time 
is[time3___][time2__][time4_].
 ```
 In summary, if the app has multiple jobs, average execution time after 
optimization is less than before.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4963 [SQL] HiveTableScan return mutable ...

2014-12-29 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3827#issuecomment-68294587
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 247 matches

Mail list logo