date:20150319

[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...

2015-03-19 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/5079#issuecomment-83359940
  
The tests in 2 PRs are different, this PR is about the UDF jar, but #4586 
is the SerDe jar. They may be loaded by difference class loader.

@jeanlyn can you paste the full code for the UDF function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4012] stop SparkContext when the except...

2015-03-19 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/5004#issuecomment-83369196
  
Cool, merging this into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...

2015-03-19 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/5079#issuecomment-83372666
  
@jeanlyn we are not getting same thing. Even our .q file differs. I don't 
have CHAR in my .q file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6222][Streaming] Dont delete checkpoint...

2015-03-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5008


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83371597
  
  [Test build #28855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28855/consoleFull)
 for   PR 4491 at commit 
[`072c39b`](https://github.com/apache/spark/commit/072c39b26583c9793ec5e94b8430a903c84b1d91).
 * This patch **fails Scala style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class AesCtrCryptoCodec extends CryptoCodec `
  * `case class CipherSuite(name: String, algoBlockSize: Int) `
  * `abstract case class CryptoCodec() `
  * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,`
  * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, 
bufferSizeVal: Int,`
  * `trait Decryptor `
  * `trait Encryptor `
  * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec 
with Logging `
  * `  class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor 
with Decryptor `
  * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends 
AesCtrCryptoCodec with Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83371598
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28855/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLib]SPARK-6348:Enable useFeatureScaling in ...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5055#issuecomment-83361565
  
  [Test build #28854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28854/consoleFull)
 for   PR 5055 at commit 
[`2dc9cb8`](https://github.com/apache/spark/commit/2dc9cb886eaaf27f3bdf761b17da18692ead0906).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83381772
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28858/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83381297
  
  [Test build #28858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28858/consoleFull)
 for   PR 4491 at commit 
[`2278b48`](https://github.com/apache/spark/commit/2278b48cb7b7bd306432f3f459212fed5b1cf3bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4012] stop SparkContext when the except...

2015-03-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5004


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83381766
  
  [Test build #28858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28858/consoleFull)
 for   PR 4491 at commit 
[`2278b48`](https://github.com/apache/spark/commit/2278b48cb7b7bd306432f3f459212fed5b1cf3bd).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class AesCtrCryptoCodec extends CryptoCodec `
  * `case class CipherSuite(name: String, algoBlockSize: Int) `
  * `abstract case class CryptoCodec() `
  * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,`
  * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, 
bufferSizeVal: Int,`
  * `trait Decryptor `
  * `trait Encryptor `
  * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec 
with Logging `
  * `  class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor 
with Decryptor `
  * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends 
AesCtrCryptoCodec with Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6354][SQL] Replace the plan which is pa...

2015-03-19 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/5044#issuecomment-83384314
  
@marmbrus I have updated the design on the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83370946
  
  [Test build #28855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28855/consoleFull)
 for   PR 4491 at commit 
[`072c39b`](https://github.com/apache/spark/commit/072c39b26583c9793ec5e94b8430a903c84b1d91).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...

2015-03-19 Thread jeanlyn

Github user jeanlyn commented on the pull request:

https://github.com/apache/spark/pull/5079#issuecomment-83372419
  
@chenghao-intel my full code is
```java
import org.apache.hadoop.hive.ql.exec.UDF;

public class hello extends UDF {
public String evaluate(String str) {
try {
return hello  + str;
} catch (Exception e) {
return null;
}
}
}
```
@adrian-wang ,I also test the `mapjoin_addjar.q` in `spark-sql`.
I got the exception when `CREATE TABLE `
```
15/03/19 14:41:36 ERROR DDLTask: java.lang.NoSuchFieldError: CHAR
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:310)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:277)
at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
```
But it seems that not the load jar problem.Because when i not run the 
```
add jar 
${system:maven.local.repository}/org/apache/hive/hcatalog/hive-hcatalog-core/${system:hive.version}/hive-hcatalog-core-${system:hive.version}.jar;
```
I got the follow exception
```
15/03/19 14:54:51 ERROR DDLTask: 
org.apache.hadoop.hive.ql.metadata.HiveException: Cannot validate serde: 
org.apache.hive.hcatalog.data.JsonSerDe
at 
org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3423)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3553)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:252)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3314#issuecomment-83372401
  
  [Test build #28856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28856/consoleFull)
 for   PR 3314 at commit 
[`6e609da`](https://github.com/apache/spark/commit/6e609daecc2c22c8a2123c628c0deca886b167a6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83379582
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28857/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83379060
  
  [Test build #28857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28857/consoleFull)
 for   PR 4491 at commit 
[`0d759d1`](https://github.com/apache/spark/commit/0d759d129213079da49714183a03fcbd97acc180).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-83379569
  
  [Test build #28857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28857/consoleFull)
 for   PR 4491 at commit 
[`0d759d1`](https://github.com/apache/spark/commit/0d759d129213079da49714183a03fcbd97acc180).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class AesCtrCryptoCodec extends CryptoCodec `
  * `case class CipherSuite(name: String, algoBlockSize: Int) `
  * `abstract case class CryptoCodec() `
  * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,`
  * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, 
bufferSizeVal: Int,`
  * `trait Decryptor `
  * `trait Encryptor `
  * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec 
with Logging `
  * `  class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor 
with Decryptor `
  * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends 
AesCtrCryptoCodec with Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...

2015-03-19 Thread jeanlyn

Github user jeanlyn commented on the pull request:

https://github.com/apache/spark/pull/5079#issuecomment-83383402
  
I also don't have CHAR in `mapjoin_addjar.q`. I only find one 
`mapjoin_addjar.q`,and the path of my file is

sql/hive/src/test/resources/ql/src/test/queries/clientpositive/mapjoin_addjar.q
```sql
set hive.auto.convert.join=true;
set hive.auto.convert.join.use.nonstaged=false;

add jar 
${system:maven.local.repository}/org/apache/hive/hcatalog/hive-hcatalog-core/${system:hive.version}/hive-hcatalog-core-${system:hive.version}.jar;

CREATE TABLE t1 (a string, b string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
;
LOAD DATA LOCAL INPATH ../../data/files/sample.json INTO TABLE t1;
select * from src join t1 on src.key =t1.a;
drop table t1;
set hive.auto.convert.join=false;

```
May be we can discuss this offline?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3314#issuecomment-83372560
  
  [Test build #28856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28856/consoleFull)
 for   PR 3314 at commit 
[`6e609da`](https://github.com/apache/spark/commit/6e609daecc2c22c8a2123c628c0deca886b167a6).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class CheckpointWriteHandler(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3314#issuecomment-83372561
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28856/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Tighten up field/method visibility in Executor...

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4850#discussion_r26803803
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/CommitDeniedException.scala ---
@@ -22,14 +22,12 @@ import org.apache.spark.{TaskCommitDenied, 
TaskEndReason}
 /**
  * Exception thrown when a task attempts to commit output to HDFS but is 
denied by the driver.
  */
-class CommitDeniedException(
+private[spark] class CommitDeniedException(
--- End diff --

Since this was inadvertently public before, and thus was public in Spark 
1.3, I think that this change will cause a MiMa failure once we bump the 
version to 1.4.0-SNAPSHOT.  Therefore, this PR sort of implicitly conflicts 
with #5056, so we'll have to make sure to re-test whichever PR we merge second.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Aditional information for users building from ...

2015-03-19 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5092#issuecomment-83789238
  
OK I am convinced, merge it. I think both hive profiles are needed in this 
example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Tighten up field/method visibility in Executor...

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4850#discussion_r26804299
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -156,12 +160,19 @@ private[spark] class Executor(
   serializedTask: ByteBuffer)
 extends Runnable {
 
+/** Whether this task has been killed. */
 @volatile private var killed = false
-@volatile var task: Task[Any] = _
-@volatile var attemptedTask: Option[Task[Any]] = None
--- End diff --

This `attemptedTask` vs `task` stuff in the old code is/was really 
confusing, so thanks for cleaning it up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83796187
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28896/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83796166
  
  [Test build #28896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28896/consoleFull)
 for   PR 5093 at commit 
[`126ce61`](https://github.com/apache/spark/commit/126ce61580d805b464f7a4534d0be05411ff0e4b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83796175
  
  [Test build #28896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28896/consoleFull)
 for   PR 5093 at commit 
[`126ce61`](https://github.com/apache/spark/commit/126ce61580d805b464f7a4534d0be05411ff0e4b).
 * This patch **passes all tests**.


 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR into Apache Spar...

2015-03-19 Thread davies

Github user davies closed the pull request at:

https://github.com/apache/spark/pull/5077


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83775810
  
  [Test build #2 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)
 for   PR 5093 at commit 
[`94d3547`](https://github.com/apache/spark/commit/94d35478c8205386ac4ff0e265a0bfbb073bc8c7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2015-03-19 Thread debasish83

Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-83776903
  
yeah if it can be parallelized by data it's best to do that and not do any 
graphx joins because for graphx the painful thing is to balance the graph and 
most of the time that step will need more work than rest of the stuff :-(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Tighten up field/method visibility in Executor...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4850#issuecomment-83783774
  
  [Test build #28892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28892/consoleFull)
 for   PR 4850 at commit 
[`866fc60`](https://github.com/apache/spark/commit/866fc60652c3f98c6c608ca6c25c33f4219a540c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Core] SPARK-5954: Top by key

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83783526
  
  [Test build #28891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28891/consoleFull)
 for   PR 5075 at commit 
[`82dded9`](https://github.com/apache/spark/commit/82dded96926f98d8a72cf40cbbc6987b191962f0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Core] SPARK-5954: Top by key

2015-03-19 Thread coderxiang

Github user coderxiang commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83783677
  
@rxin @mengxr per the comments, I created `MLPairRDDFunctions.scala` and 
moved the function there in the update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Tighten up field/method visibility in Executor...

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4850#issuecomment-83787927
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83788993
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28894/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83788987
  
  [Test build #28894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28894/consoleFull)
 for   PR 5093 at commit 
[`63a35c9`](https://github.com/apache/spark/commit/63a35c908598074bb0acb2e310a4905fe28502a0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Driver's Block Manager does not use spark.dri...

2015-03-19 Thread marsishandsome

GitHub user marsishandsome opened a pull request:

https://github.com/apache/spark/pull/5095

Driver's Block Manager does not use spark.driver.host in Yarn-Client mode

In my cluster, the yarn node does not know the client's host name.
So I set spark.driver.host to the ip address of the client.
But the driver's Block Manager does not use spark.driver.host but the 
hostname in Yarn-Client mode.

I got the following error:

 TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2, hadoop-node1538098): 
java.io.IOException: Failed to connect to example-hostname
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at 
io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:193)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:200)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1029)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496)
at 
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481)
at 
io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496)
at 
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481)
at 
io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:463)
at 
io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:849)
at 
io.netty.channel.AbstractChannel.connect(AbstractChannel.java:199)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marsishandsome/spark Spark6420

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5095


commit 2f9701d182eecc814df1730cb659fbe1622d1288
Author: guliangliang guliangli...@qiyi.com
Date:   2015-03-19T23:11:17Z

[SPARK-6420] Driver's Block Manager does not use spark.driver.host in 
Yarn-Client mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26809195
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -0,0 +1,515 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io._
+import java.net.ServerSocket
+import java.util.{Map = JMap}
+
+import scala.collection.JavaConversions._
+import scala.io.Source
+import scala.reflect.ClassTag
+import scala.util.Try
+
+import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext}
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.rdd.RDD
+import org.apache.spark._
+
+private abstract class BaseRRDD[T: ClassTag, U: ClassTag](
+parent: RDD[T],
+numPartitions: Int,
+func: Array[Byte],
+deserializer: String,
+serializer: String,
+packageNames: Array[Byte],
+rLibDir: String,
+broadcastVars: Array[Broadcast[Object]])
+  extends RDD[U](parent) with Logging {
+  override def getPartitions = parent.partitions
+
+  override def compute(split: Partition, context: TaskContext): 
Iterator[U] = {
+
+// The parent may be also an RRDD, so we should launch it first.
+val parentIterator = firstParent[T].iterator(split, context)
+
+// we expect two connections
+val serverSocket = new ServerSocket(0, 2)
+val listenPort = serverSocket.getLocalPort()
+
+// The stdout/stderr is shared by multiple tasks, because we use one 
daemon
+// to launch child process as worker.
+val errThread = RRDD.createRWorker(rLibDir, listenPort)
+
+// We use two sockets to separate input and output, then it's easy to 
manage
+// the lifecycle of them to avoid deadlock.
+// TODO: optimize it to use one socket
+
+// the socket used to send out the input of task
+serverSocket.setSoTimeout(1)
+val inSocket = serverSocket.accept()
+startStdinThread(inSocket.getOutputStream(), parentIterator, 
split.index)
+
+// the socket used to receive the output of task
+val outSocket = serverSocket.accept()
+val inputStream = new BufferedInputStream(outSocket.getInputStream)
+val dataStream = openDataStream(inputStream)
+serverSocket.close()
+
+try {
+
+  return new Iterator[U] {
+def next(): U = {
+  val obj = _nextObj
+  if (hasNext) {
+_nextObj = read()
+  }
+  obj
+}
+
+var _nextObj = read()
+
+def hasNext(): Boolean = {
+  val hasMore = (_nextObj != null)
+  if (!hasMore) {
+dataStream.close()
+  }
+  hasMore
+}
+  }
+} catch {
+  case e: Exception =
+throw new SparkException(R computation failed with\n  + 
errThread.getLines())
+}
+  }
+
+  /**
+   * Start a thread to write RDD data to the R process.
+   */
+  private def startStdinThread[T](
+output: OutputStream,
+iter: Iterator[T],
+splitIndex: Int) = {
--- End diff --

I think that Spark has migrated away from split in favor of partition, 
so it would be nice to update the occurrences of split in this PR to be 
consistent with that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Checking data types when resolving types

2015-03-19 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4685#issuecomment-83823596
  
@kai-zeng in `HiveTypeCoercion`, there are lots of rules to 
guarantee/produce the correct data type for the built-in expressions like here. 
I mean, instead of adding check here, can we just keep updating/adding the 
rules in `HiveTypeCoercion`? As in most of cases, what people need is the 
correct casting, not throwing exception right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-19 Thread nishkamravi2

GitHub user nishkamravi2 reopened a pull request:

https://github.com/apache/spark/pull/5085

[SPARK-6406] Launcher backward compatibility issue-- hadoop should not be 
mandatory in spark assembly name



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nishkamravi2/spark master_nravi

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5085.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5085


commit 681b36f5fb63e14dc89e17813894227be9e2324f
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-05-08T07:05:33Z

Fix for SPARK-1758: failing test 
org.apache.spark.JavaAPISuite.wholeTextFiles

The prefix file: is missing in the string inserted as key in HashMap

commit 5108700230fd70b995e76598f49bdf328c971e77
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-03T22:25:22Z

Fix in Spark for the Concurrent thread modification issue (SPARK-1097, 
HADOOP-10456)

commit 6b840f017870207d23e75de224710971ada0b3d0
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-03T22:34:02Z

Undo the fix for SPARK-1758 (the problem is fixed)

commit df2aeb179fca4fc893803c72a657317f5b5539d7
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-09T19:02:59Z

Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)

commit eb663ca20c73f9c467192c95fc528c6f55f202be
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-09T19:04:39Z

Merge branch 'master' of https://github.com/apache/spark

commit 5423a03ddf4d747db7261d08a64e32f44e8be95e
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-10T20:06:07Z

Merge branch 'master' of https://github.com/apache/spark

commit 3bf8fad85813037504189cf1323d381fefb6dfbe
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-16T05:47:00Z

Merge branch 'master' of https://github.com/apache/spark

commit 2b630f94079b82df3ebae2b26a3743112afcd526
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-16T06:00:31Z

Accept memory input as 30g, 512M instead of an int value, to be 
consistent with rest of Spark

commit efd688a4e15b79e92d162073035b03362fcf66f0
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-07-13T00:04:17Z

Merge branch 'master' of https://github.com/apache/spark

commit 2e69f112d1be59951cd32da4127d8b51bfa03338
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-21T23:17:15Z

Merge branch 'master' of https://github.com/apache/spark into master_nravi

commit ebcde10252e6c45169ea086e8426ec9997d46490
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-22T06:44:40Z

Modify default YARN memory_overhead-- from an additive constant to a 
multiplier (redone to resolve merge conflicts)

commit 1cf2d1ef57ed6d783df06dad36b9505bc74329fb
Author: nishkamravi2 nishkamr...@gmail.com
Date:   2014-09-22T08:54:33Z

Update YarnAllocator.scala

commit f00fa311945c1eafa8957eae5c84719521761dcd
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-22T23:06:07Z

Improving logging for AM memoryOverhead

commit c726bd9f707ce182ec8d56ffecf9da87dcdb3091
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-25T01:19:32Z

Merge branch 'master' of https://github.com/apache/spark into master_nravi

Conflicts:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala

commit 362da5edfd04bd8bad990fb210a9e11b8494fa62
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-25T19:56:13Z

Additional changes for yarn memory overhead

commit 42c2c3d18862d3632c20931ecfe2c64883c5febf
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-25T20:02:49Z

Additional changes for yarn memory overhead issue

commit dac1047995c99f5a2670f934eb8d3a4ad9b532c8
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-25T21:20:38Z

Additional documentation for yarn memory overhead issue

commit 5ac2ec11629e19030ad5577da1eee2d135cc3d1c
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-25T21:25:44Z

Remove out

commit 35daa6498048cabb736316e2f19e565c99243b7e
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-25T21:59:22Z

Slight change in the doc for yarn memory overhead

commit 8f76c8b46379736aeb7dbe1a4d88729424a041f7
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-25T22:03:00Z

Doc change for yarn memory overhead

commit 636a9ffeb4a4ae0b941edd849dcbabf38821db53
Author: nishkamravi2 nishkamr...@gmail.com
Date:   2014-09-30T18:33:28Z

Update YarnAllocator.scala

commit 5f8f9ede0fda5c7a4f6a411c746a3d893f550524
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-11-19T01:46:58Z

Merge branch 'master' of https://github.com/apache/spark into master_nravi

Conflicts:

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-19 Thread nishkamravi2

Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/5085#issuecomment-83825207
  
And btw, we need to check this in 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-19 Thread nishkamravi2

Github user nishkamravi2 closed the pull request at:

https://github.com/apache/spark/pull/5085


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5094#issuecomment-83778257
  
  [Test build #28889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28889/consoleFull)
 for   PR 5094 at commit 
[`a384b51`](https://github.com/apache/spark/commit/a384b510c0cbfbf44855d2939aae737c26c20c85).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5094#issuecomment-83778614
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28889/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5094#issuecomment-83778607
  
  [Test build #28889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28889/consoleFull)
 for   PR 5094 at commit 
[`a384b51`](https://github.com/apache/spark/commit/a384b510c0cbfbf44855d2939aae737c26c20c85).
 * This patch **fails Scala style tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83795883
  
  [Test build #28897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28897/consoleFull)
 for   PR 5075 at commit 
[`a80e0ec`](https://github.com/apache/spark/commit/a80e0ecd0ce96ffbeeaeb933dea1cada60e5863c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5177][Build] Adds parameters for specif...

2015-03-19 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3980#issuecomment-83797419
  
@srowen Don't worry, I'm gradually merging changes of this PR to #4851. An 
[experimental Jenkins builder] [1] was also set up for this. These are still 
WiP because some Hive 12 tests are still failing. I'm closing this one.

[1]: 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT_experimental/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5177][Build] Adds parameters for specif...

2015-03-19 Thread liancheng

Github user liancheng closed the pull request at:

https://github.com/apache/spark/pull/3980


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83797579
  
  [Test build #28898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28898/consoleFull)
 for   PR 5075 at commit 
[`6f565c0`](https://github.com/apache/spark/commit/6f565c07aba25c18186c53eb329f56604baeb480).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR into Apache Spar...

2015-03-19 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/5077#issuecomment-83803157
  
Close this one, will open an new one by @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26808894
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RBackend.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io.{DataOutputStream, File, FileOutputStream, IOException}
+import java.net.{InetSocketAddress, ServerSocket}
+import java.util.concurrent.TimeUnit
+
+import io.netty.bootstrap.ServerBootstrap
+import io.netty.channel.{ChannelFuture, ChannelInitializer, EventLoopGroup}
+import io.netty.channel.nio.NioEventLoopGroup
+import io.netty.channel.socket.SocketChannel
+import io.netty.channel.socket.nio.NioServerSocketChannel
+import io.netty.handler.codec.LengthFieldBasedFrameDecoder
+import io.netty.handler.codec.bytes.{ByteArrayDecoder, ByteArrayEncoder}
+
+import org.apache.spark.Logging
+
+/**
+ * Netty-based backend server that is used to communicate between R and 
Java.
+ */
+private[spark] class RBackend {
+
+  var channelFuture: ChannelFuture = null  
+  var bootstrap: ServerBootstrap = null
+  var bossGroup: EventLoopGroup = null
+
+  def init(): Int = {
+bossGroup = new NioEventLoopGroup(2)
+val workerGroup = bossGroup
+val handler = new RBackendHandler(this)
+  
+bootstrap = new ServerBootstrap()
+  .group(bossGroup, workerGroup)
+  .channel(classOf[NioServerSocketChannel])
+  
+bootstrap.childHandler(new ChannelInitializer[SocketChannel]() {
+  def initChannel(ch: SocketChannel) = {
+ch.pipeline()
+  .addLast(encoder, new ByteArrayEncoder())
+  .addLast(frameDecoder,
+// maxFrameLength = 2G
+// lengthFieldOffset = 0
+// lengthFieldLength = 4
+// lengthAdjustment = 0
+// initialBytesToStrip = 4, i.e. strip out the length field 
itself
+new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 
4))
+  .addLast(decoder, new ByteArrayDecoder())
+  .addLast(handler, handler)
+  }
+})
+
+channelFuture = bootstrap.bind(new InetSocketAddress(0))
+channelFuture.syncUninterruptibly()
+
channelFuture.channel().localAddress().asInstanceOf[InetSocketAddress].getPort()
+  }
+
+  def run() = {
+channelFuture.channel.closeFuture().syncUninterruptibly()
+  }
+
+  def close() = {
--- End diff --

Add a `: Unit =`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26808896
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RBackend.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io.{DataOutputStream, File, FileOutputStream, IOException}
+import java.net.{InetSocketAddress, ServerSocket}
+import java.util.concurrent.TimeUnit
+
+import io.netty.bootstrap.ServerBootstrap
+import io.netty.channel.{ChannelFuture, ChannelInitializer, EventLoopGroup}
+import io.netty.channel.nio.NioEventLoopGroup
+import io.netty.channel.socket.SocketChannel
+import io.netty.channel.socket.nio.NioServerSocketChannel
+import io.netty.handler.codec.LengthFieldBasedFrameDecoder
+import io.netty.handler.codec.bytes.{ByteArrayDecoder, ByteArrayEncoder}
+
+import org.apache.spark.Logging
+
+/**
+ * Netty-based backend server that is used to communicate between R and 
Java.
+ */
+private[spark] class RBackend {
+
+  var channelFuture: ChannelFuture = null  
--- End diff --

`private`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83823945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28898/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83823917
  
  [Test build #28898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28898/consoleFull)
 for   PR 5075 at commit 
[`6f565c0`](https://github.com/apache/spark/commit/6f565c07aba25c18186c53eb329f56604baeb480).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) 
extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-19 Thread nishkamravi2

Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/5085#issuecomment-83824416
  
Please ignore the comment above (I misread the regex). However, we do need 
to relax the check on hadoop. CDH itself names the outermost jar 
spark-assembly.jar. As to why we did not catch this issue with 
compute-classpath.sh, short answer: because we had our own custom version of 
it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6370][core] Documentation: Improve all ...

2015-03-19 Thread mbonaci

GitHub user mbonaci opened a pull request:

https://github.com/apache/spark/pull/5097

[SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample

The docs for the `sample` method were insufficient, now less so.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbonaci/spark-1 master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5097.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5097


commit a6a9d9756584ec503b4c4e3a25bbae4b2944c3a7
Author: mbonaci mbon...@gmail.com
Date:   2015-03-20T00:39:22Z

[SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample method




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5096#issuecomment-83808522
  
  [Test build #28899 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28899/consoleFull)
 for   PR 5096 at commit 
[`3eacfc0`](https://github.com/apache/spark/commit/3eacfc072758a445d1f01b29001c69683ac5b457).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/5096#issuecomment-83807329
  
@pwendell @rxin We might push some more fixes as they come in, but I think 
this should be ready for review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26808779
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -0,0 +1,515 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io._
+import java.net.ServerSocket
+import java.util.{Map = JMap}
+
+import scala.collection.JavaConversions._
+import scala.io.Source
+import scala.reflect.ClassTag
+import scala.util.Try
+
+import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext}
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.rdd.RDD
+import org.apache.spark._
+
+private abstract class BaseRRDD[T: ClassTag, U: ClassTag](
+parent: RDD[T],
+numPartitions: Int,
+func: Array[Byte],
+deserializer: String,
+serializer: String,
+packageNames: Array[Byte],
+rLibDir: String,
+broadcastVars: Array[Broadcast[Object]])
+  extends RDD[U](parent) with Logging {
+  override def getPartitions = parent.partitions
+
+  override def compute(split: Partition, context: TaskContext): 
Iterator[U] = {
+
+// The parent may be also an RRDD, so we should launch it first.
+val parentIterator = firstParent[T].iterator(split, context)
+
+// we expect two connections
+val serverSocket = new ServerSocket(0, 2)
+val listenPort = serverSocket.getLocalPort()
+
+// The stdout/stderr is shared by multiple tasks, because we use one 
daemon
+// to launch child process as worker.
+val errThread = RRDD.createRWorker(rLibDir, listenPort)
+
+// We use two sockets to separate input and output, then it's easy to 
manage
+// the lifecycle of them to avoid deadlock.
+// TODO: optimize it to use one socket
+
+// the socket used to send out the input of task
+serverSocket.setSoTimeout(1)
+val inSocket = serverSocket.accept()
+startStdinThread(inSocket.getOutputStream(), parentIterator, 
split.index)
+
+// the socket used to receive the output of task
+val outSocket = serverSocket.accept()
+val inputStream = new BufferedInputStream(outSocket.getInputStream)
+val dataStream = openDataStream(inputStream)
+serverSocket.close()
+
+try {
+
+  return new Iterator[U] {
+def next(): U = {
+  val obj = _nextObj
+  if (hasNext) {
+_nextObj = read()
+  }
+  obj
+}
+
+var _nextObj = read()
+
+def hasNext(): Boolean = {
+  val hasMore = (_nextObj != null)
+  if (!hasMore) {
+dataStream.close()
+  }
+  hasMore
+}
+  }
+} catch {
+  case e: Exception =
+throw new SparkException(R computation failed with\n  + 
errThread.getLines())
+}
+  }
+
+  /**
+   * Start a thread to write RDD data to the R process.
+   */
+  private def startStdinThread[T](
+output: OutputStream,
+iter: Iterator[T],
+splitIndex: Int) = {
+
+val env = SparkEnv.get
+val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt
+val stream = new BufferedOutputStream(output, bufferSize)
+
+new Thread(writer for R) {
+  override def run() {
+try {
+  SparkEnv.set(env)
+  val dataOut = new DataOutputStream(stream)
+  dataOut.writeInt(splitIndex)
+
+  SerDe.writeString(dataOut, deserializer)
+  SerDe.writeString(dataOut, serializer)
+
+  dataOut.writeInt(packageNames.length)
+  dataOut.write(packageNames)
+
+  dataOut.writeInt(func.length)
+  dataOut.write(func)
+
+  dataOut.writeInt(broadcastVars.length)
+

[GitHub] spark pull request: Tighten up field/method visibility in Executor...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4850#issuecomment-83810758
  
  [Test build #28892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28892/consoleFull)
 for   PR 4850 at commit 
[`866fc60`](https://github.com/apache/spark/commit/866fc60652c3f98c6c608ca6c25c33f4219a540c).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskCommitDenied(jobID: Int, partitionID: Int, attemptID: 
Int) extends TaskFailedReason `
  * `class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: 
String) extends Source `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Tighten up field/method visibility in Executor...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4850#issuecomment-83810794
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28892/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83821404
  
  [Test build #28897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28897/consoleFull)
 for   PR 5075 at commit 
[`a80e0ec`](https://github.com/apache/spark/commit/a80e0ecd0ce96ffbeeaeb933dea1cada60e5863c).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) 
extends Serializable `
  * `  class CheckpointWriteHandler(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83821436
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28897/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-19 Thread nishkamravi2

Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/5085#issuecomment-83825622
  
Sorry, clicked on the close button in error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6371] [build] Update version to 1.4.0-S...

2015-03-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5056#discussion_r26804471
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala ---
@@ -128,7 +128,7 @@ abstract class VertexRDD[VD](
*
* @param other the other RDD[(VertexId, VD)] with which to diff against.
*/
-  def diff(other: RDD[(VertexId, VD)]): VertexRDD[VD]
+  def diff(other: RDD[(VertexId, VD)]): VertexRDD[VD] = ???
--- End diff --

We shouldn't put ??? here, since it means we will implement this but we 
haven't got around to implement it yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5094#issuecomment-83787628
  
  [Test build #28893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28893/consoleFull)
 for   PR 5094 at commit 
[`d427d20`](https://github.com/apache/spark/commit/d427d20c0c347a16798589e89476d8c36b6ee353).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5075#discussion_r26805662
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -163,6 +163,28 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   }
 
   /**
+   * Returns the top k (largest) elements for each key from this RDD as 
defined by the specified
+   * implicit Ordering[T].
+   * If the number of elements for a certain key is less than k, all of 
them will be returned.
+   *
+   * @param num k, the number of top elements to return
+   * @param ord the implicit ordering for T
+   * @return an RDD that contains the top k values for each key
+   */
+  def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = {
+aggregateByKey(new BoundedPriorityQueue[V](num)(ord))(
+  seqOp = (queue, item) = {
+queue += item
+queue
+  },
+  combOp = (queue1, queue2)  = {
+queue1 ++= queue2
+queue1
+  }
+).mapValues(_.toArray.sorted(ord.reverse))
--- End diff --

Hm OK that surprises me but if you verified it is required, leave it of 
course


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread shivaram

GitHub user shivaram opened a pull request:

https://github.com/apache/spark/pull/5096

[SPARK-5654] Integrate SparkR

This pull requests integrates SparkR, an R frontend for Spark. The SparkR 
package contains both RDD and DataFrame APIs in R and is integrated with 
Spark's submission scripts to work on different cluster managers.

Some integration points that would be great to get feedback on:

1. Build procedure: SparkR requires R to be installed on the machine to be 
built. Right now we have a new Maven profile `-PsparkR` that can be used to 
enable SparkR builds

2. YARN cluster mode: The R package that is built needs to be present on 
the driver and all the worker nodes during execution. The R package location is 
currently set using SPARK_HOME, but this might not work on YARN cluster mode.

The SparkR package represents the work of many contributors and attached 
below is a list of people along with areas they worked on

edwardt (@edwart) - Documentation improvements
Felix Cheung (@felixcheung) - Documentation improvements
Hossein Falaki (@falaki)  - Documentation improvements
Chris Freeman (@cafreeman) - DataFrame API, Programming Guide
Todd Gao (@7c00) - R worker Internals
Ryan Hafen (@hafen) - SparkR Internals
Qian Huang (@hqzizania) - RDD API
Hao Lin (@hlin09) - RDD API, Closure cleaner
Evert Lammerts (@evertlammerts) - DataFrame API
Davies Liu (@davies) - DataFrame API, R worker internals, Merging with 
Spark 
Yi Lu (@lythesia) - RDD API, Worker internals
Matt Massie (@massie) - Jenkins build
Harihar Nahak (@hnahak87) - SparkR examples
Oscar Olmedo (@oscaroboto) - Spark configuration
Antonio Piccolboni (@piccolbo) - SparkR examples, Namespace bug fixes
Dan Putler (@dputler) - Dataframe API, SparkR Install Guide
Ashutosh Raina (@ashutoshraina) - Build improvements
Josh Rosen (@joshrosen) - Travis CI build
Sun Rui (@sun-rui)- RDD API, JVM Backend, Shuffle improvements
Shivaram Venkataraman (@shivaram) - RDD API, JVM Backend, Worker Internals
Zongheng Yang (@concretevitamin) - RDD API, Pipelined RDDs, Examples and 
EC2 guide

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amplab-extras/spark R

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5096.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5096


commit 9aa4acfeb2180b5b7c44302e1500d1bfe0639485
Author: Shivaram Venkataraman shivaram.venkatara...@gmail.com
Date:   2015-02-27T18:56:32Z

Merge pull request #184 from davies/socket

[SPARKR-155] use socket in R worker

commit 798f4536d9dfb069e0c8f1bbd1fb24be404a7c14
Author: cafreeman cfree...@alteryx.com
Date:   2015-02-27T20:04:22Z

Merge branch 'sparkr-sql' into dev

commit 3b4642980547714373ab1960cb9a096e2fcf233a
Author: Davies Liu davies@gmail.com
Date:   2015-02-27T22:07:30Z

Merge branch 'master' of github.com:amplab-extras/SparkR-pkg into random

commit 5ef66fb8b03a635e309a5004a1b411b50f63ef9c
Author: Davies Liu davies@gmail.com
Date:   2015-02-27T22:33:07Z

send back the port via temporary file

commit 2808dcfd2c0630625a5aa723cf0dbce642cd8f95
Author: cafreeman cfree...@alteryx.com
Date:   2015-02-27T23:54:17Z

Three more DataFrame methods

- `repartition`
- `distinct`
- `sampleDF`

commit cad0f0ca8c11ec5b3412b9926c92e89297a31b0a
Author: cafreeman cfree...@alteryx.com
Date:   2015-02-28T00:46:58Z

Fix docs and indents

commit 27dd3a09ce37d8afe385ccda35b425ac5655905c
Author: lythesia iranaik...@gmail.com
Date:   2015-02-28T02:00:41Z

modify tests for repartition

commit 889c265ee41f8faf3ee72e253cf019cb3a9a65a5
Author: cafreeman cfree...@alteryx.com
Date:   2015-02-28T02:08:18Z

numToInt utility function

Added `numToInt` converter function for allowing numeric arguments when 
integers are required. Updated `repartition`.

commit 7b0d070bc0fd18e26d94dfd4dbcc500963faa5bb
Author: lythesia iranaik...@gmail.com
Date:   2015-02-28T02:10:35Z

keep partitions check

commit b0e7f731f4c64daac27a975a87b22c7276bbfe61
Author: cafreeman cfree...@alteryx.com
Date:   2015-02-28T02:28:08Z

Update `sampleDF` test

commit ad0935ef12fc6639a6ce45f1860d0f62c07ae838
Author: lythesia iranaik...@gmail.com
Date:   2015-02-28T02:50:34Z

minor fixes

commit 613464951add64f1f42a1bb814d86c0aa979cc18
Author: Shivaram Venkataraman shivaram.venkatara...@gmail.com
Date:   2015-02-28T03:05:45Z

Merge pull request #187 from cafreeman/sparkr-sql

Three more DataFrame methods

commit 0346e5fc907aab71aef122e6ddc1b96f93d9abbf
Author: Davies Liu davies@gmail.com
Date:   2015-02-28T07:05:42Z

address comment

commit a00f5029279ca1e14afb4f1b63d91e946bddfd73
Author: lythesia

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26809045
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala 
---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, DataOutputStream}
+
+import scala.collection.mutable.HashMap
+
+import io.netty.channel.ChannelHandler.Sharable
+import io.netty.channel.{ChannelHandlerContext, 
SimpleChannelInboundHandler}
+
+import org.apache.spark.Logging
+import org.apache.spark.api.r.SerDe._
+
+/**
+ * Handler for RBackend
+ * TODO: This is marked as sharable to get a handle to RBackend. Is it 
safe to re-use
+ * this across connections ?
+ */
+@Sharable
+private[r] class RBackendHandler(server: RBackend)
+  extends SimpleChannelInboundHandler[Array[Byte]] with Logging {
+
+  override def channelRead0(ctx: ChannelHandlerContext, msg: Array[Byte]) {
+val bis = new ByteArrayInputStream(msg)
+val dis = new DataInputStream(bis)
+
+val bos = new ByteArrayOutputStream()
+val dos = new DataOutputStream(bos)
+
+// First bit is isStatic
+val isStatic = readBoolean(dis)
+val objId = readString(dis)
+val methodName = readString(dis)
+val numArgs = readInt(dis)
+
+if (objId == SparkRHandler) {
+  methodName match {
+case stopBackend =
+  writeInt(dos, 0)
+  writeType(dos, void)
+  server.close()
+case rm =
+  try {
+val t = readObjectType(dis)
+assert(t == 'c')
+val objToRemove = readString(dis)
+JVMObjectTracker.remove(objToRemove)
+writeInt(dos, 0)
+writeObject(dos, null)
+  } catch {
+case e: Exception =
+  logError(sRemoving $objId failed, e)
+  writeInt(dos, -1)
+  }
+case _ = dos.writeInt(-1)
+  }
+} else {
+  handleMethodCall(isStatic, objId, methodName, numArgs, dis, dos)
+}
+
+val reply = bos.toByteArray
+ctx.write(reply)
+  }
+  
+  override def channelReadComplete(ctx: ChannelHandlerContext) {
+ctx.flush()
+  }
+
+  override def exceptionCaught(ctx: ChannelHandlerContext, cause: 
Throwable) {
+// Close the connection when an exception is raised.
+cause.printStackTrace()
+ctx.close()
+  }
+
+  def handleMethodCall(
+  isStatic: Boolean,
+  objId: String,
+  methodName: String,
+  numArgs: Int,
+  dis: DataInputStream,
+  dos: DataOutputStream) {
+var obj: Object = null
+try {
+  val cls = if (isStatic) {
+Class.forName(objId)
+  } else {
+JVMObjectTracker.get(objId) match {
+  case None = throw new IllegalArgumentException(Object not 
found  + objId)
+  case Some(o) =
+obj = o
+o.getClass
+}
+  }
+
+  val args = readArgs(numArgs, dis)
+
+  val methods = cls.getMethods
+  val selectedMethods = methods.filter(m = m.getName == methodName)
+  if (selectedMethods.length  0) {
+val methods = selectedMethods.filter { x =
+  matchMethod(numArgs, args, x.getParameterTypes)
+}
+if (methods.isEmpty) {
+  logWarning(scannot find matching method ${cls}.$methodName. 
++ sCandidates are:)
+  selectedMethods.foreach { method =
+
logWarning(s$methodName(${method.getParameterTypes.mkString(,)}))
+  }
+  throw new Exception(sNo matched method found for 
$cls.$methodName)
+}
+val ret = methods.head.invoke(obj, args:_*)
+

[GitHub] spark pull request: [SQL] Checking data types when resolving types

2015-03-19 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4685#discussion_r26810586
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -18,21 +18,28 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import org.apache.spark.sql.catalyst.analysis.UnresolvedException
-import org.apache.spark.sql.catalyst.errors.TreeNodeException
 import org.apache.spark.sql.types._
 
 case class UnaryMinus(child: Expression) extends UnaryExpression {
   type EvaluatedType = Any
 
+  override lazy val resolved = child.resolved 
+(child.dataType.isInstanceOf[NumericType] || 
child.dataType.isInstanceOf[NullType])
+
   def dataType = child.dataType
   override def foldable = child.foldable
   def nullable = child.nullable
   override def toString = s-$child
 
-  lazy val numeric = dataType match {
-case n: NumericType = n.numeric.asInstanceOf[Numeric[Any]]
-case other = sys.error(sType $other does not support numeric 
operations)
-  }
+  val numeric =
+if (resolved) {
+  dataType match {
+case n: NumericType = n.numeric.asInstanceOf[Numeric[Any]]
+case n: NullType = UnresolvedNumeric
+  }
+} else {
+  UnresolvedNumeric
+}
--- End diff --

Instead of `UnresolvedNumeric`, how about just let it be `null`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83775816
  
  [Test build #2 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)
 for   PR 5093 at commit 
[`94d3547`](https://github.com/apache/spark/commit/94d35478c8205386ac4ff0e265a0bfbb073bc8c7).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch adds no new dependencies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...

2015-03-19 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/5094

[SPARK-6367][SQL] Use the proper data type for those expressions that are 
hijacking existing data types.

This PR adds internal UDTs for expressions that are hijacking existing data 
types.
The following UDTs are added:
* `HyperLogLogUDT` (`BinaryType` as the SQL type) for 
`ApproxCountDistinctPartition`
* `OpenHashSetUDT` (`ArrayType` as the SQL type) for `CollectHashSet`, 
`NewSet`, `AddItemToSet`, and `CombineSets`. 

I am also adding more unit tests for aggregation with code gen enabled.

JIRA: https://issues.apache.org/jira/browse/SPARK-6367

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark expressionType

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5094.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5094


commit a384b510c0cbfbf44855d2939aae737c26c20c85
Author: Yin Huai yh...@databricks.com
Date:   2015-03-19T21:59:04Z

Add UDTs for expressions that return HyperLogLog and OpenHashSet.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR into Apache Spar...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5077#issuecomment-83775633
  
  [Test build #28885 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28885/consoleFull)
 for   PR 5077 at commit 
[`3eacfc0`](https://github.com/apache/spark/commit/3eacfc072758a445d1f01b29001c69683ac5b457).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83780611
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28890/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83794104
  
  [Test build #28895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28895/consoleFull)
 for   PR 5093 at commit 
[`f8011d8`](https://github.com/apache/spark/commit/f8011d8886e0a2a2db74ae5715cb324eb30eedbb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-83794115
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28895/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-83802093
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83810324
  
  [Test build #28891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28891/consoleFull)
 for   PR 5075 at commit 
[`82dded9`](https://github.com/apache/spark/commit/82dded96926f98d8a72cf40cbbc6987b191962f0).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) 
extends Serializable `
  * `  class CheckpointWriteHandler(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5075#issuecomment-83810341
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28891/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26808740
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -0,0 +1,515 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io._
+import java.net.ServerSocket
+import java.util.{Map = JMap}
+
+import scala.collection.JavaConversions._
+import scala.io.Source
+import scala.reflect.ClassTag
+import scala.util.Try
+
+import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext}
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.rdd.RDD
+import org.apache.spark._
+
+private abstract class BaseRRDD[T: ClassTag, U: ClassTag](
+parent: RDD[T],
+numPartitions: Int,
+func: Array[Byte],
+deserializer: String,
+serializer: String,
+packageNames: Array[Byte],
+rLibDir: String,
+broadcastVars: Array[Broadcast[Object]])
+  extends RDD[U](parent) with Logging {
+  override def getPartitions = parent.partitions
+
+  override def compute(split: Partition, context: TaskContext): 
Iterator[U] = {
+
+// The parent may be also an RRDD, so we should launch it first.
+val parentIterator = firstParent[T].iterator(split, context)
+
+// we expect two connections
+val serverSocket = new ServerSocket(0, 2)
+val listenPort = serverSocket.getLocalPort()
+
+// The stdout/stderr is shared by multiple tasks, because we use one 
daemon
+// to launch child process as worker.
+val errThread = RRDD.createRWorker(rLibDir, listenPort)
+
+// We use two sockets to separate input and output, then it's easy to 
manage
+// the lifecycle of them to avoid deadlock.
+// TODO: optimize it to use one socket
+
+// the socket used to send out the input of task
+serverSocket.setSoTimeout(1)
+val inSocket = serverSocket.accept()
+startStdinThread(inSocket.getOutputStream(), parentIterator, 
split.index)
+
+// the socket used to receive the output of task
+val outSocket = serverSocket.accept()
+val inputStream = new BufferedInputStream(outSocket.getInputStream)
+val dataStream = openDataStream(inputStream)
+serverSocket.close()
+
+try {
+
+  return new Iterator[U] {
+def next(): U = {
+  val obj = _nextObj
+  if (hasNext) {
+_nextObj = read()
+  }
+  obj
+}
+
+var _nextObj = read()
+
+def hasNext(): Boolean = {
+  val hasMore = (_nextObj != null)
+  if (!hasMore) {
+dataStream.close()
+  }
+  hasMore
+}
+  }
+} catch {
+  case e: Exception =
+throw new SparkException(R computation failed with\n  + 
errThread.getLines())
+}
+  }
+
+  /**
+   * Start a thread to write RDD data to the R process.
+   */
+  private def startStdinThread[T](
+output: OutputStream,
+iter: Iterator[T],
+splitIndex: Int) = {
+
+val env = SparkEnv.get
+val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt
+val stream = new BufferedOutputStream(output, bufferSize)
+
+new Thread(writer for R) {
+  override def run() {
+try {
+  SparkEnv.set(env)
+  val dataOut = new DataOutputStream(stream)
+  dataOut.writeInt(splitIndex)
+
+  SerDe.writeString(dataOut, deserializer)
+  SerDe.writeString(dataOut, serializer)
+
+  dataOut.writeInt(packageNames.length)
+  dataOut.write(packageNames)
+
+  dataOut.writeInt(func.length)
+  dataOut.write(func)
+
+  dataOut.writeInt(broadcastVars.length)
+

[GitHub] spark pull request: [SPARK-6286][Mesos][minor] Handle missing Meso...

2015-03-19 Thread dragos

Github user dragos commented on the pull request:

https://github.com/apache/spark/pull/5088#issuecomment-83810423
  
LGTM, FWIW :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26809309
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -0,0 +1,515 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io._
+import java.net.ServerSocket
+import java.util.{Map = JMap}
+
+import scala.collection.JavaConversions._
+import scala.io.Source
+import scala.reflect.ClassTag
+import scala.util.Try
+
+import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext}
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.rdd.RDD
+import org.apache.spark._
+
+private abstract class BaseRRDD[T: ClassTag, U: ClassTag](
+parent: RDD[T],
+numPartitions: Int,
+func: Array[Byte],
+deserializer: String,
+serializer: String,
+packageNames: Array[Byte],
+rLibDir: String,
+broadcastVars: Array[Broadcast[Object]])
+  extends RDD[U](parent) with Logging {
+  override def getPartitions = parent.partitions
+
+  override def compute(split: Partition, context: TaskContext): 
Iterator[U] = {
+
+// The parent may be also an RRDD, so we should launch it first.
+val parentIterator = firstParent[T].iterator(split, context)
+
+// we expect two connections
+val serverSocket = new ServerSocket(0, 2)
+val listenPort = serverSocket.getLocalPort()
+
+// The stdout/stderr is shared by multiple tasks, because we use one 
daemon
+// to launch child process as worker.
+val errThread = RRDD.createRWorker(rLibDir, listenPort)
+
+// We use two sockets to separate input and output, then it's easy to 
manage
+// the lifecycle of them to avoid deadlock.
+// TODO: optimize it to use one socket
+
+// the socket used to send out the input of task
+serverSocket.setSoTimeout(1)
+val inSocket = serverSocket.accept()
+startStdinThread(inSocket.getOutputStream(), parentIterator, 
split.index)
+
+// the socket used to receive the output of task
+val outSocket = serverSocket.accept()
+val inputStream = new BufferedInputStream(outSocket.getInputStream)
+val dataStream = openDataStream(inputStream)
+serverSocket.close()
+
+try {
+
+  return new Iterator[U] {
+def next(): U = {
+  val obj = _nextObj
+  if (hasNext) {
+_nextObj = read()
+  }
+  obj
+}
+
+var _nextObj = read()
+
+def hasNext(): Boolean = {
+  val hasMore = (_nextObj != null)
+  if (!hasMore) {
+dataStream.close()
+  }
+  hasMore
+}
+  }
+} catch {
+  case e: Exception =
+throw new SparkException(R computation failed with\n  + 
errThread.getLines())
+}
+  }
+
+  /**
+   * Start a thread to write RDD data to the R process.
+   */
+  private def startStdinThread[T](
+output: OutputStream,
+iter: Iterator[T],
+splitIndex: Int) = {
+
+val env = SparkEnv.get
+val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt
+val stream = new BufferedOutputStream(output, bufferSize)
+
+new Thread(writer for R) {
+  override def run() {
+try {
+  SparkEnv.set(env)
+  val dataOut = new DataOutputStream(stream)
+  dataOut.writeInt(splitIndex)
+
+  SerDe.writeString(dataOut, deserializer)
+  SerDe.writeString(dataOut, serializer)
+
+  dataOut.writeInt(packageNames.length)
+  dataOut.write(packageNames)
+
+  dataOut.writeInt(func.length)
+  dataOut.write(func)
+
+  dataOut.writeInt(broadcastVars.length)
+

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

2015-03-19 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5096#discussion_r26809287
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -0,0 +1,515 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.r
+
+import java.io._
+import java.net.ServerSocket
+import java.util.{Map = JMap}
+
+import scala.collection.JavaConversions._
+import scala.io.Source
+import scala.reflect.ClassTag
+import scala.util.Try
+
+import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext}
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.rdd.RDD
+import org.apache.spark._
+
+private abstract class BaseRRDD[T: ClassTag, U: ClassTag](
+parent: RDD[T],
+numPartitions: Int,
+func: Array[Byte],
+deserializer: String,
+serializer: String,
+packageNames: Array[Byte],
+rLibDir: String,
+broadcastVars: Array[Broadcast[Object]])
+  extends RDD[U](parent) with Logging {
+  override def getPartitions = parent.partitions
+
+  override def compute(split: Partition, context: TaskContext): 
Iterator[U] = {
+
+// The parent may be also an RRDD, so we should launch it first.
+val parentIterator = firstParent[T].iterator(split, context)
+
+// we expect two connections
+val serverSocket = new ServerSocket(0, 2)
+val listenPort = serverSocket.getLocalPort()
+
+// The stdout/stderr is shared by multiple tasks, because we use one 
daemon
+// to launch child process as worker.
+val errThread = RRDD.createRWorker(rLibDir, listenPort)
+
+// We use two sockets to separate input and output, then it's easy to 
manage
+// the lifecycle of them to avoid deadlock.
+// TODO: optimize it to use one socket
+
+// the socket used to send out the input of task
+serverSocket.setSoTimeout(1)
+val inSocket = serverSocket.accept()
+startStdinThread(inSocket.getOutputStream(), parentIterator, 
split.index)
+
+// the socket used to receive the output of task
+val outSocket = serverSocket.accept()
+val inputStream = new BufferedInputStream(outSocket.getInputStream)
+val dataStream = openDataStream(inputStream)
+serverSocket.close()
+
+try {
+
+  return new Iterator[U] {
+def next(): U = {
+  val obj = _nextObj
+  if (hasNext) {
+_nextObj = read()
+  }
+  obj
+}
+
+var _nextObj = read()
+
+def hasNext(): Boolean = {
+  val hasMore = (_nextObj != null)
+  if (!hasMore) {
+dataStream.close()
+  }
+  hasMore
+}
+  }
+} catch {
+  case e: Exception =
+throw new SparkException(R computation failed with\n  + 
errThread.getLines())
+}
+  }
+
+  /**
+   * Start a thread to write RDD data to the R process.
+   */
+  private def startStdinThread[T](
+output: OutputStream,
+iter: Iterator[T],
+splitIndex: Int) = {
+
+val env = SparkEnv.get
+val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt
+val stream = new BufferedOutputStream(output, bufferSize)
+
+new Thread(writer for R) {
+  override def run() {
+try {
+  SparkEnv.set(env)
+  val dataOut = new DataOutputStream(stream)
+  dataOut.writeInt(splitIndex)
+
+  SerDe.writeString(dataOut, deserializer)
+  SerDe.writeString(dataOut, serializer)
+
+  dataOut.writeInt(packageNames.length)
+  dataOut.write(packageNames)
+
+  dataOut.writeInt(func.length)
+  dataOut.write(func)
+
+  dataOut.writeInt(broadcastVars.length)
+

[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5094#issuecomment-83812455
  
  [Test build #28893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28893/consoleFull)
 for   PR 5094 at commit 
[`d427d20`](https://github.com/apache/spark/commit/d427d20c0c347a16798589e89476d8c36b6ee353).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5094#issuecomment-83812472
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28893/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6207] [YARN] [SQL] Adds delegation toke...

2015-03-19 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/5031#discussion_r26811012
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -903,6 +908,30 @@ object Client extends Logging {
   }
 
   /**
+   * Obtains token for the Hive metastore and adds them to the credentials.
+   */
+  private def obtainTokenForHiveMetastore(conf: Configuration, 
credentials: Credentials) {
+if (UserGroupInformation.isSecurityEnabled /* And Hive is enabled */) {
+  val hc = org.apache.hadoop.hive.ql.metadata.Hive.get
+  val principal = 
hc.getConf().get(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname)
+  val username = UserGroupInformation.getCurrentUser().getUserName
+
+  if (principal == null) {
+val errorMessage = Required hive metastore principal is not 
configured!
+logError(errorMessage)
+throw new IllegalArgumentException(errorMessage)
+  }
+
+  val tokenStr = hc.getDelegationToken(username,principal)
+  val hive2Token = new Token[DelegationTokenIdentifier]()
+  hive2Token.decodeFromUrlString(tokenStr)
+  credentials.addToken(new Text(hive.server2.delegation.token), 
hive2Token)
+  logDebug(Added the Hive Server 2 token to conf.)
+  org.apache.hadoop.hive.ql.metadata.Hive.closeCurrent
--- End diff --

hdfs (namenode) delegation tokens are renewed by the Yarn resourcemanager 
for you, up til they expire at a week. (Then you need pr4688)  Unfortunately 
the resourcemanager it doesn't handle hive or hbase tokens.  I personally think 
putting in this code for hive and then possible hbase for us to know how to get 
it is ok as long as the interfaces we are using are public and not likely to 
change.  However we should have a way to skip it if its not configured.

Yes, long running services should be able to renew or reacquire with what 
Hari is doing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3665][GraphX] Java API for GraphX

2015-03-19 Thread kdatta

Github user kdatta commented on the pull request:

https://github.com/apache/spark/pull/3234#issuecomment-83823758
  
I had to add the Junit dependency in graphx/pom.xml to compile. Did you see 
this issue? We might have to update the pom file.

-Kushal. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...

2015-03-19 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4930#discussion_r26811257
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -151,7 +152,15 @@ class TestHiveContext(sc: SparkContext) extends 
HiveContext(sc) {
 
   val describedTable = DESCRIBE (\\w+).r
 
+  val vs = new VariableSubstitution()
+
   protected[hive] class HiveQLQueryExecution(hql: String)
+extends this.SubstitutedHiveQLQueryExecution(vs.substitute(hiveconf, 
hql))
+
+  // we should substitute variables in hql to pass the text to parseSql() 
as a parameter.
+  // Hive parser need substituted text. HiveContext.sql() does this but 
return a DataFrame,
+  // while we need a logicalPlan so we cannot reuse that.
+  protected[hive] class SubstitutedHiveQLQueryExecution(hql: String)
 extends this.QueryExecution(HiveQl.parseSql(hql)) {
 def hiveExec() = runSqlHive(hql)
--- End diff --

@adrian-wang how about adding the substitution in `HiveContext.runSqlHive` 
or `HiveContext.runHive`? Then we probably not necessary to change anything in 
`TestHive`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6370][core] Documentation: Improve all ...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5097#issuecomment-83832497
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6354][SQL] Replace the plan which is pa...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5044#issuecomment-83393681
  
  [Test build #28859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28859/consoleFull)
 for   PR 5044 at commit 
[`e65a19f`](https://github.com/apache/spark/commit/e65a19f4bd9a731ad9b75f387f96b3612f05f66f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLib]SPARK-6348:Enable useFeatureScaling in ...

2015-03-19 Thread tanyinyan

Github user tanyinyan commented on the pull request:

https://github.com/apache/spark/pull/5055#issuecomment-83407453
  
Yesï¼I have made this constructor and setter public


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-19 Thread advancedxy

Github user advancedxy commented on the pull request:

https://github.com/apache/spark/pull/4783#issuecomment-83410489
  
@shivaram @srowen Tuple2(Int, Int) got specialized to Tuple2$mcII$sp class. 
But the Tuple2$mcII$sp is a subclass of Tuple2. So in our implementation, the 
specialized class will get two additional object references. (_1, _2 in 
superclass Tuple2, in our case). So, for Tuple2(Int, Int), SizeEstimator will 
give 32 bytes rather than 24 bytes. In theory, the Tuple2(1,2) class filed 
layout should be something like below.
```
scala.Tuple2$mcII$sp object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 ( 0001      )
4 4 (object header) 00 00 00 00 (       )
8 4 (object header) 05 c3 00 f8 ( 0101 1100 0011    1000)
12 4 Object Tuple2._1 null
16 4 Object Tuple2._2 null
20 4 int Tuple2$mcII$sp._1$mcI$sp 1
24 4 int Tuple2$mcII$sp._2$mcI$sp 2
28 4 (loss due to the next object alignment)
Instance size: 32 bytes (reported by Instrumentation API)
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
```

But in practice, the size of Tuple2(1, 2) is 24 bytes. So is there any 
scala expert we can ping? I really want to know why Tuple2(1, 2) can be 24 
bytes when the specialized version is involved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6286][Mesos][minor] Handle missing Meso...

2015-03-19 Thread jongyoul

GitHub user jongyoul opened a pull request:

https://github.com/apache/spark/pull/5088

[SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR

- Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and 
synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend
- This is related #5000 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jongyoul/spark SPARK-6286-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5088


commit ac4336ae5988598a9ed663588606c410dd154480
Author: Jongyoul Lee jongy...@gmail.com
Date:   2015-03-19T09:13:28Z

[SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR
- Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and 
synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

2015-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4610#issuecomment-83433562
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28860/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6286][Mesos][minor] Handle missing Meso...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5088#issuecomment-83440771
  
  [Test build #28865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28865/consoleFull)
 for   PR 5088 at commit 
[`4f2362f`](https://github.com/apache/spark/commit/4f2362f55009688fae168ff22c0f5dfee22abda1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3314#issuecomment-83440739
  
  [Test build #28866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28866/consoleFull)
 for   PR 3314 at commit 
[`fa5bcbb`](https://github.com/apache/spark/commit/fa5bcbbb4215bef56006bbe0d1081a2c237b8b72).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

2015-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4610#issuecomment-83399347
  
  [Test build #28860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28860/consoleFull)
 for   PR 4610 at commit 
[`c387fce`](https://github.com/apache/spark/commit/c387fcef43ed45bc6469902216c57b9937ae5a1d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6408] [SQL] Fix JDBCRDD filtering strin...

2015-03-19 Thread ypcat

GitHub user ypcat opened a pull request:

https://github.com/apache/spark/pull/5087

[SPARK-6408] [SQL] Fix JDBCRDD filtering string literals



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ypcat/spark spark-6408

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5087


commit 896253457361c75ec8678950d9549ed4187f895b
Author: Pei-Lun Lee pl...@appier.com
Date:   2015-03-19T08:20:51Z

[SPARK-6408] [SQL] Fix filtering string literals




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...

2015-03-19 Thread haiyangsea

Github user haiyangsea commented on the pull request:

https://github.com/apache/spark/pull/5082#issuecomment-83411727
  
It looks like a greate feature!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-03-19 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-83422662
  
This patch is forgotten by us...
@srowen @markhamstra @kayousterhout 
this patch can prevent from endless retry which may occurs  after a 
executor is killed or lost. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 359 matches

Mail list logo