[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5079#issuecomment-83359940 The tests in 2 PRs are different, this PR is about the UDF jar, but #4586 is the SerDe jar. They may be loaded by difference class loader. @jeanlyn can you paste the full code for the UDF function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4012] stop SparkContext when the except...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/5004#issuecomment-83369196 Cool, merging this into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/5079#issuecomment-83372666 @jeanlyn we are not getting same thing. Even our .q file differs. I don't have CHAR in my .q file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6222][Streaming] Dont delete checkpoint...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5008 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83371597 [Test build #28855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28855/consoleFull) for PR 4491 at commit [`072c39b`](https://github.com/apache/spark/commit/072c39b26583c9793ec5e94b8430a903c84b1d91). * This patch **fails Scala style tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `abstract class AesCtrCryptoCodec extends CryptoCodec ` * `case class CipherSuite(name: String, algoBlockSize: Int) ` * `abstract case class CryptoCodec() ` * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,` * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, bufferSizeVal: Int,` * `trait Decryptor ` * `trait Encryptor ` * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` * ` class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor with Decryptor ` * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83371598 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28855/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLib]SPARK-6348:Enable useFeatureScaling in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5055#issuecomment-83361565 [Test build #28854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28854/consoleFull) for PR 5055 at commit [`2dc9cb8`](https://github.com/apache/spark/commit/2dc9cb886eaaf27f3bdf761b17da18692ead0906). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83381772 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28858/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83381297 [Test build #28858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28858/consoleFull) for PR 4491 at commit [`2278b48`](https://github.com/apache/spark/commit/2278b48cb7b7bd306432f3f459212fed5b1cf3bd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4012] stop SparkContext when the except...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5004 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83381766 [Test build #28858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28858/consoleFull) for PR 4491 at commit [`2278b48`](https://github.com/apache/spark/commit/2278b48cb7b7bd306432f3f459212fed5b1cf3bd). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class AesCtrCryptoCodec extends CryptoCodec ` * `case class CipherSuite(name: String, algoBlockSize: Int) ` * `abstract case class CryptoCodec() ` * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,` * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, bufferSizeVal: Int,` * `trait Decryptor ` * `trait Encryptor ` * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` * ` class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor with Decryptor ` * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6354][SQL] Replace the plan which is pa...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/5044#issuecomment-83384314 @marmbrus I have updated the design on the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83370946 [Test build #28855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28855/consoleFull) for PR 4491 at commit [`072c39b`](https://github.com/apache/spark/commit/072c39b26583c9793ec5e94b8430a903c84b1d91). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...
Github user jeanlyn commented on the pull request: https://github.com/apache/spark/pull/5079#issuecomment-83372419 @chenghao-intel my full code is ```java import org.apache.hadoop.hive.ql.exec.UDF; public class hello extends UDF { public String evaluate(String str) { try { return hello + str; } catch (Exception e) { return null; } } } ``` @adrian-wang ,I also test the `mapjoin_addjar.q` in `spark-sql`. I got the exception when `CREATE TABLE ` ``` 15/03/19 14:41:36 ERROR DDLTask: java.lang.NoSuchFieldError: CHAR at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:310) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:277) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) ``` But it seems that not the load jar problem.Because when i not run the ``` add jar ${system:maven.local.repository}/org/apache/hive/hcatalog/hive-hcatalog-core/${system:hive.version}/hive-hcatalog-core-${system:hive.version}.jar; ``` I got the follow exception ``` 15/03/19 14:54:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe at org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3423) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3553) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:252) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3314#issuecomment-83372401 [Test build #28856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28856/consoleFull) for PR 3314 at commit [`6e609da`](https://github.com/apache/spark/commit/6e609daecc2c22c8a2123c628c0deca886b167a6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83379582 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28857/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83379060 [Test build #28857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28857/consoleFull) for PR 4491 at commit [`0d759d1`](https://github.com/apache/spark/commit/0d759d129213079da49714183a03fcbd97acc180). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-83379569 [Test build #28857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28857/consoleFull) for PR 4491 at commit [`0d759d1`](https://github.com/apache/spark/commit/0d759d129213079da49714183a03fcbd97acc180). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class AesCtrCryptoCodec extends CryptoCodec ` * `case class CipherSuite(name: String, algoBlockSize: Int) ` * `abstract case class CryptoCodec() ` * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,` * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, bufferSizeVal: Int,` * `trait Decryptor ` * `trait Encryptor ` * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` * ` class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor with Decryptor ` * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6392][SQL]Minor fix ClassNotFound excep...
Github user jeanlyn commented on the pull request: https://github.com/apache/spark/pull/5079#issuecomment-83383402 I also don't have CHAR in `mapjoin_addjar.q`. I only find one `mapjoin_addjar.q`,and the path of my file is sql/hive/src/test/resources/ql/src/test/queries/clientpositive/mapjoin_addjar.q ```sql set hive.auto.convert.join=true; set hive.auto.convert.join.use.nonstaged=false; add jar ${system:maven.local.repository}/org/apache/hive/hcatalog/hive-hcatalog-core/${system:hive.version}/hive-hcatalog-core-${system:hive.version}.jar; CREATE TABLE t1 (a string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' ; LOAD DATA LOCAL INPATH ../../data/files/sample.json INTO TABLE t1; select * from src join t1 on src.key =t1.a; drop table t1; set hive.auto.convert.join=false; ``` May be we can discuss this offline? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3314#issuecomment-83372560 [Test build #28856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28856/consoleFull) for PR 3314 at commit [`6e609da`](https://github.com/apache/spark/commit/6e609daecc2c22c8a2123c628c0deca886b167a6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class CheckpointWriteHandler(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3314#issuecomment-83372561 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28856/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Tighten up field/method visibility in Executor...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4850#discussion_r26803803 --- Diff: core/src/main/scala/org/apache/spark/executor/CommitDeniedException.scala --- @@ -22,14 +22,12 @@ import org.apache.spark.{TaskCommitDenied, TaskEndReason} /** * Exception thrown when a task attempts to commit output to HDFS but is denied by the driver. */ -class CommitDeniedException( +private[spark] class CommitDeniedException( --- End diff -- Since this was inadvertently public before, and thus was public in Spark 1.3, I think that this change will cause a MiMa failure once we bump the version to 1.4.0-SNAPSHOT. Therefore, this PR sort of implicitly conflicts with #5056, so we'll have to make sure to re-test whichever PR we merge second. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Aditional information for users building from ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5092#issuecomment-83789238 OK I am convinced, merge it. I think both hive profiles are needed in this example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Tighten up field/method visibility in Executor...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4850#discussion_r26804299 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -156,12 +160,19 @@ private[spark] class Executor( serializedTask: ByteBuffer) extends Runnable { +/** Whether this task has been killed. */ @volatile private var killed = false -@volatile var task: Task[Any] = _ -@volatile var attemptedTask: Option[Task[Any]] = None --- End diff -- This `attemptedTask` vs `task` stuff in the old code is/was really confusing, so thanks for cleaning it up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83796187 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28896/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83796166 [Test build #28896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28896/consoleFull) for PR 5093 at commit [`126ce61`](https://github.com/apache/spark/commit/126ce61580d805b464f7a4534d0be05411ff0e4b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83796175 [Test build #28896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28896/consoleFull) for PR 5093 at commit [`126ce61`](https://github.com/apache/spark/commit/126ce61580d805b464f7a4534d0be05411ff0e4b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR into Apache Spar...
Github user davies closed the pull request at: https://github.com/apache/spark/pull/5077 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83775810 [Test build #2 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull) for PR 5093 at commit [`94d3547`](https://github.com/apache/spark/commit/94d35478c8205386ac4ff0e265a0bfbb073bc8c7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-83776903 yeah if it can be parallelized by data it's best to do that and not do any graphx joins because for graphx the painful thing is to balance the graph and most of the time that step will need more work than rest of the stuff :-( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Tighten up field/method visibility in Executor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4850#issuecomment-83783774 [Test build #28892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28892/consoleFull) for PR 4850 at commit [`866fc60`](https://github.com/apache/spark/commit/866fc60652c3f98c6c608ca6c25c33f4219a540c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Core] SPARK-5954: Top by key
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83783526 [Test build #28891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28891/consoleFull) for PR 5075 at commit [`82dded9`](https://github.com/apache/spark/commit/82dded96926f98d8a72cf40cbbc6987b191962f0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Core] SPARK-5954: Top by key
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83783677 @rxin @mengxr per the comments, I created `MLPairRDDFunctions.scala` and moved the function there in the update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Tighten up field/method visibility in Executor...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4850#issuecomment-83787927 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83788993 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28894/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83788987 [Test build #28894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28894/consoleFull) for PR 5093 at commit [`63a35c9`](https://github.com/apache/spark/commit/63a35c908598074bb0acb2e310a4905fe28502a0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Driver's Block Manager does not use spark.dri...
GitHub user marsishandsome opened a pull request: https://github.com/apache/spark/pull/5095 Driver's Block Manager does not use spark.driver.host in Yarn-Client mode In my cluster, the yarn node does not know the client's host name. So I set spark.driver.host to the ip address of the client. But the driver's Block Manager does not use spark.driver.host but the hostname in Yarn-Client mode. I got the following error: TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2, hadoop-node1538098): java.io.IOException: Failed to connect to example-hostname at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:127) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644) at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:193) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:200) at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1029) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481) at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:496) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:481) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:463) at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:849) at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:199) at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ... 1 more You can merge this pull request into a Git repository by running: $ git pull https://github.com/marsishandsome/spark Spark6420 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5095 commit 2f9701d182eecc814df1730cb659fbe1622d1288 Author: guliangliang guliangli...@qiyi.com Date: 2015-03-19T23:11:17Z [SPARK-6420] Driver's Block Manager does not use spark.driver.host in Yarn-Client mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26809195 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -0,0 +1,515 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io._ +import java.net.ServerSocket +import java.util.{Map = JMap} + +import scala.collection.JavaConversions._ +import scala.io.Source +import scala.reflect.ClassTag +import scala.util.Try + +import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext} +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark._ + +private abstract class BaseRRDD[T: ClassTag, U: ClassTag]( +parent: RDD[T], +numPartitions: Int, +func: Array[Byte], +deserializer: String, +serializer: String, +packageNames: Array[Byte], +rLibDir: String, +broadcastVars: Array[Broadcast[Object]]) + extends RDD[U](parent) with Logging { + override def getPartitions = parent.partitions + + override def compute(split: Partition, context: TaskContext): Iterator[U] = { + +// The parent may be also an RRDD, so we should launch it first. +val parentIterator = firstParent[T].iterator(split, context) + +// we expect two connections +val serverSocket = new ServerSocket(0, 2) +val listenPort = serverSocket.getLocalPort() + +// The stdout/stderr is shared by multiple tasks, because we use one daemon +// to launch child process as worker. +val errThread = RRDD.createRWorker(rLibDir, listenPort) + +// We use two sockets to separate input and output, then it's easy to manage +// the lifecycle of them to avoid deadlock. +// TODO: optimize it to use one socket + +// the socket used to send out the input of task +serverSocket.setSoTimeout(1) +val inSocket = serverSocket.accept() +startStdinThread(inSocket.getOutputStream(), parentIterator, split.index) + +// the socket used to receive the output of task +val outSocket = serverSocket.accept() +val inputStream = new BufferedInputStream(outSocket.getInputStream) +val dataStream = openDataStream(inputStream) +serverSocket.close() + +try { + + return new Iterator[U] { +def next(): U = { + val obj = _nextObj + if (hasNext) { +_nextObj = read() + } + obj +} + +var _nextObj = read() + +def hasNext(): Boolean = { + val hasMore = (_nextObj != null) + if (!hasMore) { +dataStream.close() + } + hasMore +} + } +} catch { + case e: Exception = +throw new SparkException(R computation failed with\n + errThread.getLines()) +} + } + + /** + * Start a thread to write RDD data to the R process. + */ + private def startStdinThread[T]( +output: OutputStream, +iter: Iterator[T], +splitIndex: Int) = { --- End diff -- I think that Spark has migrated away from split in favor of partition, so it would be nice to update the occurrences of split in this PR to be consistent with that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Checking data types when resolving types
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4685#issuecomment-83823596 @kai-zeng in `HiveTypeCoercion`, there are lots of rules to guarantee/produce the correct data type for the built-in expressions like here. I mean, instead of adding check here, can we just keep updating/adding the rules in `HiveTypeCoercion`? As in most of cases, what people need is the correct casting, not throwing exception right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
GitHub user nishkamravi2 reopened a pull request: https://github.com/apache/spark/pull/5085 [SPARK-6406] Launcher backward compatibility issue-- hadoop should not be mandatory in spark assembly name You can merge this pull request into a Git repository by running: $ git pull https://github.com/nishkamravi2/spark master_nravi Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5085.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5085 commit 681b36f5fb63e14dc89e17813894227be9e2324f Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-05-08T07:05:33Z Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles The prefix file: is missing in the string inserted as key in HashMap commit 5108700230fd70b995e76598f49bdf328c971e77 Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-06-03T22:25:22Z Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) commit 6b840f017870207d23e75de224710971ada0b3d0 Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-06-03T22:34:02Z Undo the fix for SPARK-1758 (the problem is fixed) commit df2aeb179fca4fc893803c72a657317f5b5539d7 Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-06-09T19:02:59Z Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) commit eb663ca20c73f9c467192c95fc528c6f55f202be Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-06-09T19:04:39Z Merge branch 'master' of https://github.com/apache/spark commit 5423a03ddf4d747db7261d08a64e32f44e8be95e Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-06-10T20:06:07Z Merge branch 'master' of https://github.com/apache/spark commit 3bf8fad85813037504189cf1323d381fefb6dfbe Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-06-16T05:47:00Z Merge branch 'master' of https://github.com/apache/spark commit 2b630f94079b82df3ebae2b26a3743112afcd526 Author: nravi nr...@c1704.halxg.cloudera.com Date: 2014-06-16T06:00:31Z Accept memory input as 30g, 512M instead of an int value, to be consistent with rest of Spark commit efd688a4e15b79e92d162073035b03362fcf66f0 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-07-13T00:04:17Z Merge branch 'master' of https://github.com/apache/spark commit 2e69f112d1be59951cd32da4127d8b51bfa03338 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-21T23:17:15Z Merge branch 'master' of https://github.com/apache/spark into master_nravi commit ebcde10252e6c45169ea086e8426ec9997d46490 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-22T06:44:40Z Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts) commit 1cf2d1ef57ed6d783df06dad36b9505bc74329fb Author: nishkamravi2 nishkamr...@gmail.com Date: 2014-09-22T08:54:33Z Update YarnAllocator.scala commit f00fa311945c1eafa8957eae5c84719521761dcd Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-22T23:06:07Z Improving logging for AM memoryOverhead commit c726bd9f707ce182ec8d56ffecf9da87dcdb3091 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-25T01:19:32Z Merge branch 'master' of https://github.com/apache/spark into master_nravi Conflicts: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala commit 362da5edfd04bd8bad990fb210a9e11b8494fa62 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-25T19:56:13Z Additional changes for yarn memory overhead commit 42c2c3d18862d3632c20931ecfe2c64883c5febf Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-25T20:02:49Z Additional changes for yarn memory overhead issue commit dac1047995c99f5a2670f934eb8d3a4ad9b532c8 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-25T21:20:38Z Additional documentation for yarn memory overhead issue commit 5ac2ec11629e19030ad5577da1eee2d135cc3d1c Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-25T21:25:44Z Remove out commit 35daa6498048cabb736316e2f19e565c99243b7e Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-25T21:59:22Z Slight change in the doc for yarn memory overhead commit 8f76c8b46379736aeb7dbe1a4d88729424a041f7 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-09-25T22:03:00Z Doc change for yarn memory overhead commit 636a9ffeb4a4ae0b941edd849dcbabf38821db53 Author: nishkamravi2 nishkamr...@gmail.com Date: 2014-09-30T18:33:28Z Update YarnAllocator.scala commit 5f8f9ede0fda5c7a4f6a411c746a3d893f550524 Author: Nishkam Ravi nr...@cloudera.com Date: 2014-11-19T01:46:58Z Merge branch 'master' of https://github.com/apache/spark into master_nravi Conflicts: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
Github user nishkamravi2 commented on the pull request: https://github.com/apache/spark/pull/5085#issuecomment-83825207 And btw, we need to check this in --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
Github user nishkamravi2 closed the pull request at: https://github.com/apache/spark/pull/5085 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5094#issuecomment-83778257 [Test build #28889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28889/consoleFull) for PR 5094 at commit [`a384b51`](https://github.com/apache/spark/commit/a384b510c0cbfbf44855d2939aae737c26c20c85). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5094#issuecomment-83778614 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28889/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5094#issuecomment-83778607 [Test build #28889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28889/consoleFull) for PR 5094 at commit [`a384b51`](https://github.com/apache/spark/commit/a384b510c0cbfbf44855d2939aae737c26c20c85). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83795883 [Test build #28897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28897/consoleFull) for PR 5075 at commit [`a80e0ec`](https://github.com/apache/spark/commit/a80e0ecd0ce96ffbeeaeb933dea1cada60e5863c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5177][Build] Adds parameters for specif...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3980#issuecomment-83797419 @srowen Don't worry, I'm gradually merging changes of this PR to #4851. An [experimental Jenkins builder] [1] was also set up for this. These are still WiP because some Hive 12 tests are still failing. I'm closing this one. [1]: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT_experimental/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5177][Build] Adds parameters for specif...
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/3980 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83797579 [Test build #28898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28898/consoleFull) for PR 5075 at commit [`6f565c0`](https://github.com/apache/spark/commit/6f565c07aba25c18186c53eb329f56604baeb480). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR into Apache Spar...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5077#issuecomment-83803157 Close this one, will open an new one by @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26808894 --- Diff: core/src/main/scala/org/apache/spark/api/r/RBackend.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io.{DataOutputStream, File, FileOutputStream, IOException} +import java.net.{InetSocketAddress, ServerSocket} +import java.util.concurrent.TimeUnit + +import io.netty.bootstrap.ServerBootstrap +import io.netty.channel.{ChannelFuture, ChannelInitializer, EventLoopGroup} +import io.netty.channel.nio.NioEventLoopGroup +import io.netty.channel.socket.SocketChannel +import io.netty.channel.socket.nio.NioServerSocketChannel +import io.netty.handler.codec.LengthFieldBasedFrameDecoder +import io.netty.handler.codec.bytes.{ByteArrayDecoder, ByteArrayEncoder} + +import org.apache.spark.Logging + +/** + * Netty-based backend server that is used to communicate between R and Java. + */ +private[spark] class RBackend { + + var channelFuture: ChannelFuture = null + var bootstrap: ServerBootstrap = null + var bossGroup: EventLoopGroup = null + + def init(): Int = { +bossGroup = new NioEventLoopGroup(2) +val workerGroup = bossGroup +val handler = new RBackendHandler(this) + +bootstrap = new ServerBootstrap() + .group(bossGroup, workerGroup) + .channel(classOf[NioServerSocketChannel]) + +bootstrap.childHandler(new ChannelInitializer[SocketChannel]() { + def initChannel(ch: SocketChannel) = { +ch.pipeline() + .addLast(encoder, new ByteArrayEncoder()) + .addLast(frameDecoder, +// maxFrameLength = 2G +// lengthFieldOffset = 0 +// lengthFieldLength = 4 +// lengthAdjustment = 0 +// initialBytesToStrip = 4, i.e. strip out the length field itself +new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4)) + .addLast(decoder, new ByteArrayDecoder()) + .addLast(handler, handler) + } +}) + +channelFuture = bootstrap.bind(new InetSocketAddress(0)) +channelFuture.syncUninterruptibly() + channelFuture.channel().localAddress().asInstanceOf[InetSocketAddress].getPort() + } + + def run() = { +channelFuture.channel.closeFuture().syncUninterruptibly() + } + + def close() = { --- End diff -- Add a `: Unit =`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26808896 --- Diff: core/src/main/scala/org/apache/spark/api/r/RBackend.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io.{DataOutputStream, File, FileOutputStream, IOException} +import java.net.{InetSocketAddress, ServerSocket} +import java.util.concurrent.TimeUnit + +import io.netty.bootstrap.ServerBootstrap +import io.netty.channel.{ChannelFuture, ChannelInitializer, EventLoopGroup} +import io.netty.channel.nio.NioEventLoopGroup +import io.netty.channel.socket.SocketChannel +import io.netty.channel.socket.nio.NioServerSocketChannel +import io.netty.handler.codec.LengthFieldBasedFrameDecoder +import io.netty.handler.codec.bytes.{ByteArrayDecoder, ByteArrayEncoder} + +import org.apache.spark.Logging + +/** + * Netty-based backend server that is used to communicate between R and Java. + */ +private[spark] class RBackend { + + var channelFuture: ChannelFuture = null --- End diff -- `private` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83823945 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28898/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83823917 [Test build #28898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28898/consoleFull) for PR 5075 at commit [`6f565c0`](https://github.com/apache/spark/commit/6f565c07aba25c18186c53eb329f56604baeb480). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
Github user nishkamravi2 commented on the pull request: https://github.com/apache/spark/pull/5085#issuecomment-83824416 Please ignore the comment above (I misread the regex). However, we do need to relax the check on hadoop. CDH itself names the outermost jar spark-assembly.jar. As to why we did not catch this issue with compute-classpath.sh, short answer: because we had our own custom version of it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6370][core] Documentation: Improve all ...
GitHub user mbonaci opened a pull request: https://github.com/apache/spark/pull/5097 [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample The docs for the `sample` method were insufficient, now less so. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbonaci/spark-1 master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5097.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5097 commit a6a9d9756584ec503b4c4e3a25bbae4b2944c3a7 Author: mbonaci mbon...@gmail.com Date: 2015-03-20T00:39:22Z [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5096#issuecomment-83808522 [Test build #28899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28899/consoleFull) for PR 5096 at commit [`3eacfc0`](https://github.com/apache/spark/commit/3eacfc072758a445d1f01b29001c69683ac5b457). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/5096#issuecomment-83807329 @pwendell @rxin We might push some more fixes as they come in, but I think this should be ready for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26808779 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -0,0 +1,515 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io._ +import java.net.ServerSocket +import java.util.{Map = JMap} + +import scala.collection.JavaConversions._ +import scala.io.Source +import scala.reflect.ClassTag +import scala.util.Try + +import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext} +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark._ + +private abstract class BaseRRDD[T: ClassTag, U: ClassTag]( +parent: RDD[T], +numPartitions: Int, +func: Array[Byte], +deserializer: String, +serializer: String, +packageNames: Array[Byte], +rLibDir: String, +broadcastVars: Array[Broadcast[Object]]) + extends RDD[U](parent) with Logging { + override def getPartitions = parent.partitions + + override def compute(split: Partition, context: TaskContext): Iterator[U] = { + +// The parent may be also an RRDD, so we should launch it first. +val parentIterator = firstParent[T].iterator(split, context) + +// we expect two connections +val serverSocket = new ServerSocket(0, 2) +val listenPort = serverSocket.getLocalPort() + +// The stdout/stderr is shared by multiple tasks, because we use one daemon +// to launch child process as worker. +val errThread = RRDD.createRWorker(rLibDir, listenPort) + +// We use two sockets to separate input and output, then it's easy to manage +// the lifecycle of them to avoid deadlock. +// TODO: optimize it to use one socket + +// the socket used to send out the input of task +serverSocket.setSoTimeout(1) +val inSocket = serverSocket.accept() +startStdinThread(inSocket.getOutputStream(), parentIterator, split.index) + +// the socket used to receive the output of task +val outSocket = serverSocket.accept() +val inputStream = new BufferedInputStream(outSocket.getInputStream) +val dataStream = openDataStream(inputStream) +serverSocket.close() + +try { + + return new Iterator[U] { +def next(): U = { + val obj = _nextObj + if (hasNext) { +_nextObj = read() + } + obj +} + +var _nextObj = read() + +def hasNext(): Boolean = { + val hasMore = (_nextObj != null) + if (!hasMore) { +dataStream.close() + } + hasMore +} + } +} catch { + case e: Exception = +throw new SparkException(R computation failed with\n + errThread.getLines()) +} + } + + /** + * Start a thread to write RDD data to the R process. + */ + private def startStdinThread[T]( +output: OutputStream, +iter: Iterator[T], +splitIndex: Int) = { + +val env = SparkEnv.get +val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt +val stream = new BufferedOutputStream(output, bufferSize) + +new Thread(writer for R) { + override def run() { +try { + SparkEnv.set(env) + val dataOut = new DataOutputStream(stream) + dataOut.writeInt(splitIndex) + + SerDe.writeString(dataOut, deserializer) + SerDe.writeString(dataOut, serializer) + + dataOut.writeInt(packageNames.length) + dataOut.write(packageNames) + + dataOut.writeInt(func.length) + dataOut.write(func) + + dataOut.writeInt(broadcastVars.length) +
[GitHub] spark pull request: Tighten up field/method visibility in Executor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4850#issuecomment-83810758 [Test build #28892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28892/consoleFull) for PR 4850 at commit [`866fc60`](https://github.com/apache/spark/commit/866fc60652c3f98c6c608ca6c25c33f4219a540c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskCommitDenied(jobID: Int, partitionID: Int, attemptID: Int) extends TaskFailedReason ` * `class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends Source ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Tighten up field/method visibility in Executor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4850#issuecomment-83810794 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28892/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83821404 [Test build #28897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28897/consoleFull) for PR 5075 at commit [`a80e0ec`](https://github.com/apache/spark/commit/a80e0ecd0ce96ffbeeaeb933dea1cada60e5863c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Serializable ` * ` class CheckpointWriteHandler(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83821436 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28897/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
Github user nishkamravi2 commented on the pull request: https://github.com/apache/spark/pull/5085#issuecomment-83825622 Sorry, clicked on the close button in error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6371] [build] Update version to 1.4.0-S...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5056#discussion_r26804471 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala --- @@ -128,7 +128,7 @@ abstract class VertexRDD[VD]( * * @param other the other RDD[(VertexId, VD)] with which to diff against. */ - def diff(other: RDD[(VertexId, VD)]): VertexRDD[VD] + def diff(other: RDD[(VertexId, VD)]): VertexRDD[VD] = ??? --- End diff -- We shouldn't put ??? here, since it means we will implement this but we haven't got around to implement it yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5094#issuecomment-83787628 [Test build #28893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28893/consoleFull) for PR 5094 at commit [`d427d20`](https://github.com/apache/spark/commit/d427d20c0c347a16798589e89476d8c36b6ee353). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5075#discussion_r26805662 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -163,6 +163,28 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) } /** + * Returns the top k (largest) elements for each key from this RDD as defined by the specified + * implicit Ordering[T]. + * If the number of elements for a certain key is less than k, all of them will be returned. + * + * @param num k, the number of top elements to return + * @param ord the implicit ordering for T + * @return an RDD that contains the top k values for each key + */ + def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = { +aggregateByKey(new BoundedPriorityQueue[V](num)(ord))( + seqOp = (queue, item) = { +queue += item +queue + }, + combOp = (queue1, queue2) = { +queue1 ++= queue2 +queue1 + } +).mapValues(_.toArray.sorted(ord.reverse)) --- End diff -- Hm OK that surprises me but if you verified it is required, leave it of course --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/5096 [SPARK-5654] Integrate SparkR This pull requests integrates SparkR, an R frontend for Spark. The SparkR package contains both RDD and DataFrame APIs in R and is integrated with Spark's submission scripts to work on different cluster managers. Some integration points that would be great to get feedback on: 1. Build procedure: SparkR requires R to be installed on the machine to be built. Right now we have a new Maven profile `-PsparkR` that can be used to enable SparkR builds 2. YARN cluster mode: The R package that is built needs to be present on the driver and all the worker nodes during execution. The R package location is currently set using SPARK_HOME, but this might not work on YARN cluster mode. The SparkR package represents the work of many contributors and attached below is a list of people along with areas they worked on edwardt (@edwart) - Documentation improvements Felix Cheung (@felixcheung) - Documentation improvements Hossein Falaki (@falaki) - Documentation improvements Chris Freeman (@cafreeman) - DataFrame API, Programming Guide Todd Gao (@7c00) - R worker Internals Ryan Hafen (@hafen) - SparkR Internals Qian Huang (@hqzizania) - RDD API Hao Lin (@hlin09) - RDD API, Closure cleaner Evert Lammerts (@evertlammerts) - DataFrame API Davies Liu (@davies) - DataFrame API, R worker internals, Merging with Spark Yi Lu (@lythesia) - RDD API, Worker internals Matt Massie (@massie) - Jenkins build Harihar Nahak (@hnahak87) - SparkR examples Oscar Olmedo (@oscaroboto) - Spark configuration Antonio Piccolboni (@piccolbo) - SparkR examples, Namespace bug fixes Dan Putler (@dputler) - Dataframe API, SparkR Install Guide Ashutosh Raina (@ashutoshraina) - Build improvements Josh Rosen (@joshrosen) - Travis CI build Sun Rui (@sun-rui)- RDD API, JVM Backend, Shuffle improvements Shivaram Venkataraman (@shivaram) - RDD API, JVM Backend, Worker Internals Zongheng Yang (@concretevitamin) - RDD API, Pipelined RDDs, Examples and EC2 guide You can merge this pull request into a Git repository by running: $ git pull https://github.com/amplab-extras/spark R Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5096.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5096 commit 9aa4acfeb2180b5b7c44302e1500d1bfe0639485 Author: Shivaram Venkataraman shivaram.venkatara...@gmail.com Date: 2015-02-27T18:56:32Z Merge pull request #184 from davies/socket [SPARKR-155] use socket in R worker commit 798f4536d9dfb069e0c8f1bbd1fb24be404a7c14 Author: cafreeman cfree...@alteryx.com Date: 2015-02-27T20:04:22Z Merge branch 'sparkr-sql' into dev commit 3b4642980547714373ab1960cb9a096e2fcf233a Author: Davies Liu davies@gmail.com Date: 2015-02-27T22:07:30Z Merge branch 'master' of github.com:amplab-extras/SparkR-pkg into random commit 5ef66fb8b03a635e309a5004a1b411b50f63ef9c Author: Davies Liu davies@gmail.com Date: 2015-02-27T22:33:07Z send back the port via temporary file commit 2808dcfd2c0630625a5aa723cf0dbce642cd8f95 Author: cafreeman cfree...@alteryx.com Date: 2015-02-27T23:54:17Z Three more DataFrame methods - `repartition` - `distinct` - `sampleDF` commit cad0f0ca8c11ec5b3412b9926c92e89297a31b0a Author: cafreeman cfree...@alteryx.com Date: 2015-02-28T00:46:58Z Fix docs and indents commit 27dd3a09ce37d8afe385ccda35b425ac5655905c Author: lythesia iranaik...@gmail.com Date: 2015-02-28T02:00:41Z modify tests for repartition commit 889c265ee41f8faf3ee72e253cf019cb3a9a65a5 Author: cafreeman cfree...@alteryx.com Date: 2015-02-28T02:08:18Z numToInt utility function Added `numToInt` converter function for allowing numeric arguments when integers are required. Updated `repartition`. commit 7b0d070bc0fd18e26d94dfd4dbcc500963faa5bb Author: lythesia iranaik...@gmail.com Date: 2015-02-28T02:10:35Z keep partitions check commit b0e7f731f4c64daac27a975a87b22c7276bbfe61 Author: cafreeman cfree...@alteryx.com Date: 2015-02-28T02:28:08Z Update `sampleDF` test commit ad0935ef12fc6639a6ce45f1860d0f62c07ae838 Author: lythesia iranaik...@gmail.com Date: 2015-02-28T02:50:34Z minor fixes commit 613464951add64f1f42a1bb814d86c0aa979cc18 Author: Shivaram Venkataraman shivaram.venkatara...@gmail.com Date: 2015-02-28T03:05:45Z Merge pull request #187 from cafreeman/sparkr-sql Three more DataFrame methods commit 0346e5fc907aab71aef122e6ddc1b96f93d9abbf Author: Davies Liu davies@gmail.com Date: 2015-02-28T07:05:42Z address comment commit a00f5029279ca1e14afb4f1b63d91e946bddfd73 Author: lythesia
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26809045 --- Diff: core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, DataInputStream, DataOutputStream} + +import scala.collection.mutable.HashMap + +import io.netty.channel.ChannelHandler.Sharable +import io.netty.channel.{ChannelHandlerContext, SimpleChannelInboundHandler} + +import org.apache.spark.Logging +import org.apache.spark.api.r.SerDe._ + +/** + * Handler for RBackend + * TODO: This is marked as sharable to get a handle to RBackend. Is it safe to re-use + * this across connections ? + */ +@Sharable +private[r] class RBackendHandler(server: RBackend) + extends SimpleChannelInboundHandler[Array[Byte]] with Logging { + + override def channelRead0(ctx: ChannelHandlerContext, msg: Array[Byte]) { +val bis = new ByteArrayInputStream(msg) +val dis = new DataInputStream(bis) + +val bos = new ByteArrayOutputStream() +val dos = new DataOutputStream(bos) + +// First bit is isStatic +val isStatic = readBoolean(dis) +val objId = readString(dis) +val methodName = readString(dis) +val numArgs = readInt(dis) + +if (objId == SparkRHandler) { + methodName match { +case stopBackend = + writeInt(dos, 0) + writeType(dos, void) + server.close() +case rm = + try { +val t = readObjectType(dis) +assert(t == 'c') +val objToRemove = readString(dis) +JVMObjectTracker.remove(objToRemove) +writeInt(dos, 0) +writeObject(dos, null) + } catch { +case e: Exception = + logError(sRemoving $objId failed, e) + writeInt(dos, -1) + } +case _ = dos.writeInt(-1) + } +} else { + handleMethodCall(isStatic, objId, methodName, numArgs, dis, dos) +} + +val reply = bos.toByteArray +ctx.write(reply) + } + + override def channelReadComplete(ctx: ChannelHandlerContext) { +ctx.flush() + } + + override def exceptionCaught(ctx: ChannelHandlerContext, cause: Throwable) { +// Close the connection when an exception is raised. +cause.printStackTrace() +ctx.close() + } + + def handleMethodCall( + isStatic: Boolean, + objId: String, + methodName: String, + numArgs: Int, + dis: DataInputStream, + dos: DataOutputStream) { +var obj: Object = null +try { + val cls = if (isStatic) { +Class.forName(objId) + } else { +JVMObjectTracker.get(objId) match { + case None = throw new IllegalArgumentException(Object not found + objId) + case Some(o) = +obj = o +o.getClass +} + } + + val args = readArgs(numArgs, dis) + + val methods = cls.getMethods + val selectedMethods = methods.filter(m = m.getName == methodName) + if (selectedMethods.length 0) { +val methods = selectedMethods.filter { x = + matchMethod(numArgs, args, x.getParameterTypes) +} +if (methods.isEmpty) { + logWarning(scannot find matching method ${cls}.$methodName. ++ sCandidates are:) + selectedMethods.foreach { method = + logWarning(s$methodName(${method.getParameterTypes.mkString(,)})) + } + throw new Exception(sNo matched method found for $cls.$methodName) +} +val ret = methods.head.invoke(obj, args:_*) +
[GitHub] spark pull request: [SQL] Checking data types when resolving types
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4685#discussion_r26810586 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -18,21 +18,28 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.catalyst.analysis.UnresolvedException -import org.apache.spark.sql.catalyst.errors.TreeNodeException import org.apache.spark.sql.types._ case class UnaryMinus(child: Expression) extends UnaryExpression { type EvaluatedType = Any + override lazy val resolved = child.resolved +(child.dataType.isInstanceOf[NumericType] || child.dataType.isInstanceOf[NullType]) + def dataType = child.dataType override def foldable = child.foldable def nullable = child.nullable override def toString = s-$child - lazy val numeric = dataType match { -case n: NumericType = n.numeric.asInstanceOf[Numeric[Any]] -case other = sys.error(sType $other does not support numeric operations) - } + val numeric = +if (resolved) { + dataType match { +case n: NumericType = n.numeric.asInstanceOf[Numeric[Any]] +case n: NullType = UnresolvedNumeric + } +} else { + UnresolvedNumeric +} --- End diff -- Instead of `UnresolvedNumeric`, how about just let it be `null`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83775816 [Test build #2 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull) for PR 5093 at commit [`94d3547`](https://github.com/apache/spark/commit/94d35478c8205386ac4ff0e265a0bfbb073bc8c7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch adds no new dependencies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/5094 [SPARK-6367][SQL] Use the proper data type for those expressions that are hijacking existing data types. This PR adds internal UDTs for expressions that are hijacking existing data types. The following UDTs are added: * `HyperLogLogUDT` (`BinaryType` as the SQL type) for `ApproxCountDistinctPartition` * `OpenHashSetUDT` (`ArrayType` as the SQL type) for `CollectHashSet`, `NewSet`, `AddItemToSet`, and `CombineSets`. I am also adding more unit tests for aggregation with code gen enabled. JIRA: https://issues.apache.org/jira/browse/SPARK-6367 You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark expressionType Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5094.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5094 commit a384b510c0cbfbf44855d2939aae737c26c20c85 Author: Yin Huai yh...@databricks.com Date: 2015-03-19T21:59:04Z Add UDTs for expressions that return HyperLogLog and OpenHashSet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR into Apache Spar...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5077#issuecomment-83775633 [Test build #28885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28885/consoleFull) for PR 5077 at commit [`3eacfc0`](https://github.com/apache/spark/commit/3eacfc072758a445d1f01b29001c69683ac5b457). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83780611 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28890/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83794104 [Test build #28895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28895/consoleFull) for PR 5093 at commit [`f8011d8`](https://github.com/apache/spark/commit/f8011d8886e0a2a2db74ae5715cb324eb30eedbb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-83794115 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28895/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-83802093 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83810324 [Test build #28891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28891/consoleFull) for PR 5075 at commit [`82dded9`](https://github.com/apache/spark/commit/82dded96926f98d8a72cf40cbbc6987b191962f0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Serializable ` * ` class CheckpointWriteHandler(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] SPARK-5954: Top by key
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5075#issuecomment-83810341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28891/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26808740 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -0,0 +1,515 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io._ +import java.net.ServerSocket +import java.util.{Map = JMap} + +import scala.collection.JavaConversions._ +import scala.io.Source +import scala.reflect.ClassTag +import scala.util.Try + +import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext} +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark._ + +private abstract class BaseRRDD[T: ClassTag, U: ClassTag]( +parent: RDD[T], +numPartitions: Int, +func: Array[Byte], +deserializer: String, +serializer: String, +packageNames: Array[Byte], +rLibDir: String, +broadcastVars: Array[Broadcast[Object]]) + extends RDD[U](parent) with Logging { + override def getPartitions = parent.partitions + + override def compute(split: Partition, context: TaskContext): Iterator[U] = { + +// The parent may be also an RRDD, so we should launch it first. +val parentIterator = firstParent[T].iterator(split, context) + +// we expect two connections +val serverSocket = new ServerSocket(0, 2) +val listenPort = serverSocket.getLocalPort() + +// The stdout/stderr is shared by multiple tasks, because we use one daemon +// to launch child process as worker. +val errThread = RRDD.createRWorker(rLibDir, listenPort) + +// We use two sockets to separate input and output, then it's easy to manage +// the lifecycle of them to avoid deadlock. +// TODO: optimize it to use one socket + +// the socket used to send out the input of task +serverSocket.setSoTimeout(1) +val inSocket = serverSocket.accept() +startStdinThread(inSocket.getOutputStream(), parentIterator, split.index) + +// the socket used to receive the output of task +val outSocket = serverSocket.accept() +val inputStream = new BufferedInputStream(outSocket.getInputStream) +val dataStream = openDataStream(inputStream) +serverSocket.close() + +try { + + return new Iterator[U] { +def next(): U = { + val obj = _nextObj + if (hasNext) { +_nextObj = read() + } + obj +} + +var _nextObj = read() + +def hasNext(): Boolean = { + val hasMore = (_nextObj != null) + if (!hasMore) { +dataStream.close() + } + hasMore +} + } +} catch { + case e: Exception = +throw new SparkException(R computation failed with\n + errThread.getLines()) +} + } + + /** + * Start a thread to write RDD data to the R process. + */ + private def startStdinThread[T]( +output: OutputStream, +iter: Iterator[T], +splitIndex: Int) = { + +val env = SparkEnv.get +val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt +val stream = new BufferedOutputStream(output, bufferSize) + +new Thread(writer for R) { + override def run() { +try { + SparkEnv.set(env) + val dataOut = new DataOutputStream(stream) + dataOut.writeInt(splitIndex) + + SerDe.writeString(dataOut, deserializer) + SerDe.writeString(dataOut, serializer) + + dataOut.writeInt(packageNames.length) + dataOut.write(packageNames) + + dataOut.writeInt(func.length) + dataOut.write(func) + + dataOut.writeInt(broadcastVars.length) +
[GitHub] spark pull request: [SPARK-6286][Mesos][minor] Handle missing Meso...
Github user dragos commented on the pull request: https://github.com/apache/spark/pull/5088#issuecomment-83810423 LGTM, FWIW :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26809309 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -0,0 +1,515 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io._ +import java.net.ServerSocket +import java.util.{Map = JMap} + +import scala.collection.JavaConversions._ +import scala.io.Source +import scala.reflect.ClassTag +import scala.util.Try + +import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext} +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark._ + +private abstract class BaseRRDD[T: ClassTag, U: ClassTag]( +parent: RDD[T], +numPartitions: Int, +func: Array[Byte], +deserializer: String, +serializer: String, +packageNames: Array[Byte], +rLibDir: String, +broadcastVars: Array[Broadcast[Object]]) + extends RDD[U](parent) with Logging { + override def getPartitions = parent.partitions + + override def compute(split: Partition, context: TaskContext): Iterator[U] = { + +// The parent may be also an RRDD, so we should launch it first. +val parentIterator = firstParent[T].iterator(split, context) + +// we expect two connections +val serverSocket = new ServerSocket(0, 2) +val listenPort = serverSocket.getLocalPort() + +// The stdout/stderr is shared by multiple tasks, because we use one daemon +// to launch child process as worker. +val errThread = RRDD.createRWorker(rLibDir, listenPort) + +// We use two sockets to separate input and output, then it's easy to manage +// the lifecycle of them to avoid deadlock. +// TODO: optimize it to use one socket + +// the socket used to send out the input of task +serverSocket.setSoTimeout(1) +val inSocket = serverSocket.accept() +startStdinThread(inSocket.getOutputStream(), parentIterator, split.index) + +// the socket used to receive the output of task +val outSocket = serverSocket.accept() +val inputStream = new BufferedInputStream(outSocket.getInputStream) +val dataStream = openDataStream(inputStream) +serverSocket.close() + +try { + + return new Iterator[U] { +def next(): U = { + val obj = _nextObj + if (hasNext) { +_nextObj = read() + } + obj +} + +var _nextObj = read() + +def hasNext(): Boolean = { + val hasMore = (_nextObj != null) + if (!hasMore) { +dataStream.close() + } + hasMore +} + } +} catch { + case e: Exception = +throw new SparkException(R computation failed with\n + errThread.getLines()) +} + } + + /** + * Start a thread to write RDD data to the R process. + */ + private def startStdinThread[T]( +output: OutputStream, +iter: Iterator[T], +splitIndex: Int) = { + +val env = SparkEnv.get +val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt +val stream = new BufferedOutputStream(output, bufferSize) + +new Thread(writer for R) { + override def run() { +try { + SparkEnv.set(env) + val dataOut = new DataOutputStream(stream) + dataOut.writeInt(splitIndex) + + SerDe.writeString(dataOut, deserializer) + SerDe.writeString(dataOut, serializer) + + dataOut.writeInt(packageNames.length) + dataOut.write(packageNames) + + dataOut.writeInt(func.length) + dataOut.write(func) + + dataOut.writeInt(broadcastVars.length) +
[GitHub] spark pull request: [SPARK-5654] Integrate SparkR
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5096#discussion_r26809287 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -0,0 +1,515 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.r + +import java.io._ +import java.net.ServerSocket +import java.util.{Map = JMap} + +import scala.collection.JavaConversions._ +import scala.io.Source +import scala.reflect.ClassTag +import scala.util.Try + +import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext} +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark._ + +private abstract class BaseRRDD[T: ClassTag, U: ClassTag]( +parent: RDD[T], +numPartitions: Int, +func: Array[Byte], +deserializer: String, +serializer: String, +packageNames: Array[Byte], +rLibDir: String, +broadcastVars: Array[Broadcast[Object]]) + extends RDD[U](parent) with Logging { + override def getPartitions = parent.partitions + + override def compute(split: Partition, context: TaskContext): Iterator[U] = { + +// The parent may be also an RRDD, so we should launch it first. +val parentIterator = firstParent[T].iterator(split, context) + +// we expect two connections +val serverSocket = new ServerSocket(0, 2) +val listenPort = serverSocket.getLocalPort() + +// The stdout/stderr is shared by multiple tasks, because we use one daemon +// to launch child process as worker. +val errThread = RRDD.createRWorker(rLibDir, listenPort) + +// We use two sockets to separate input and output, then it's easy to manage +// the lifecycle of them to avoid deadlock. +// TODO: optimize it to use one socket + +// the socket used to send out the input of task +serverSocket.setSoTimeout(1) +val inSocket = serverSocket.accept() +startStdinThread(inSocket.getOutputStream(), parentIterator, split.index) + +// the socket used to receive the output of task +val outSocket = serverSocket.accept() +val inputStream = new BufferedInputStream(outSocket.getInputStream) +val dataStream = openDataStream(inputStream) +serverSocket.close() + +try { + + return new Iterator[U] { +def next(): U = { + val obj = _nextObj + if (hasNext) { +_nextObj = read() + } + obj +} + +var _nextObj = read() + +def hasNext(): Boolean = { + val hasMore = (_nextObj != null) + if (!hasMore) { +dataStream.close() + } + hasMore +} + } +} catch { + case e: Exception = +throw new SparkException(R computation failed with\n + errThread.getLines()) +} + } + + /** + * Start a thread to write RDD data to the R process. + */ + private def startStdinThread[T]( +output: OutputStream, +iter: Iterator[T], +splitIndex: Int) = { + +val env = SparkEnv.get +val bufferSize = System.getProperty(spark.buffer.size, 65536).toInt +val stream = new BufferedOutputStream(output, bufferSize) + +new Thread(writer for R) { + override def run() { +try { + SparkEnv.set(env) + val dataOut = new DataOutputStream(stream) + dataOut.writeInt(splitIndex) + + SerDe.writeString(dataOut, deserializer) + SerDe.writeString(dataOut, serializer) + + dataOut.writeInt(packageNames.length) + dataOut.write(packageNames) + + dataOut.writeInt(func.length) + dataOut.write(func) + + dataOut.writeInt(broadcastVars.length) +
[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5094#issuecomment-83812455 [Test build #28893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28893/consoleFull) for PR 5094 at commit [`d427d20`](https://github.com/apache/spark/commit/d427d20c0c347a16798589e89476d8c36b6ee353). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6367][SQL] Use the proper data type for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5094#issuecomment-83812472 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28893/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6207] [YARN] [SQL] Adds delegation toke...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/5031#discussion_r26811012 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -903,6 +908,30 @@ object Client extends Logging { } /** + * Obtains token for the Hive metastore and adds them to the credentials. + */ + private def obtainTokenForHiveMetastore(conf: Configuration, credentials: Credentials) { +if (UserGroupInformation.isSecurityEnabled /* And Hive is enabled */) { + val hc = org.apache.hadoop.hive.ql.metadata.Hive.get + val principal = hc.getConf().get(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname) + val username = UserGroupInformation.getCurrentUser().getUserName + + if (principal == null) { +val errorMessage = Required hive metastore principal is not configured! +logError(errorMessage) +throw new IllegalArgumentException(errorMessage) + } + + val tokenStr = hc.getDelegationToken(username,principal) + val hive2Token = new Token[DelegationTokenIdentifier]() + hive2Token.decodeFromUrlString(tokenStr) + credentials.addToken(new Text(hive.server2.delegation.token), hive2Token) + logDebug(Added the Hive Server 2 token to conf.) + org.apache.hadoop.hive.ql.metadata.Hive.closeCurrent --- End diff -- hdfs (namenode) delegation tokens are renewed by the Yarn resourcemanager for you, up til they expire at a week. (Then you need pr4688) Unfortunately the resourcemanager it doesn't handle hive or hbase tokens. I personally think putting in this code for hive and then possible hbase for us to know how to get it is ok as long as the interfaces we are using are public and not likely to change. However we should have a way to skip it if its not configured. Yes, long running services should be able to renew or reacquire with what Hari is doing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3665][GraphX] Java API for GraphX
Github user kdatta commented on the pull request: https://github.com/apache/spark/pull/3234#issuecomment-83823758 I had to add the Junit dependency in graphx/pom.xml to compile. Did you see this issue? We might have to update the pom file. -Kushal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4930#discussion_r26811257 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -151,7 +152,15 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) { val describedTable = DESCRIBE (\\w+).r + val vs = new VariableSubstitution() + protected[hive] class HiveQLQueryExecution(hql: String) +extends this.SubstitutedHiveQLQueryExecution(vs.substitute(hiveconf, hql)) + + // we should substitute variables in hql to pass the text to parseSql() as a parameter. + // Hive parser need substituted text. HiveContext.sql() does this but return a DataFrame, + // while we need a logicalPlan so we cannot reuse that. + protected[hive] class SubstitutedHiveQLQueryExecution(hql: String) extends this.QueryExecution(HiveQl.parseSql(hql)) { def hiveExec() = runSqlHive(hql) --- End diff -- @adrian-wang how about adding the substitution in `HiveContext.runSqlHive` or `HiveContext.runHive`? Then we probably not necessary to change anything in `TestHive`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6370][core] Documentation: Improve all ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5097#issuecomment-83832497 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6354][SQL] Replace the plan which is pa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5044#issuecomment-83393681 [Test build #28859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28859/consoleFull) for PR 5044 at commit [`e65a19f`](https://github.com/apache/spark/commit/e65a19f4bd9a731ad9b75f387f96b3612f05f66f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLib]SPARK-6348:Enable useFeatureScaling in ...
Github user tanyinyan commented on the pull request: https://github.com/apache/spark/pull/5055#issuecomment-83407453 Yesï¼I have made this constructor and setter public --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-83410489 @shivaram @srowen Tuple2(Int, Int) got specialized to Tuple2$mcII$sp class. But the Tuple2$mcII$sp is a subclass of Tuple2. So in our implementation, the specialized class will get two additional object references. (_1, _2 in superclass Tuple2, in our case). So, for Tuple2(Int, Int), SizeEstimator will give 32 bytes rather than 24 bytes. In theory, the Tuple2(1,2) class filed layout should be something like below. ``` scala.Tuple2$mcII$sp object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 ( 0001 ) 4 4 (object header) 00 00 00 00 ( ) 8 4 (object header) 05 c3 00 f8 ( 0101 1100 0011 1000) 12 4 Object Tuple2._1 null 16 4 Object Tuple2._2 null 20 4 int Tuple2$mcII$sp._1$mcI$sp 1 24 4 int Tuple2$mcII$sp._2$mcI$sp 2 28 4 (loss due to the next object alignment) Instance size: 32 bytes (reported by Instrumentation API) Space losses: 0 bytes internal + 4 bytes external = 4 bytes total ``` But in practice, the size of Tuple2(1, 2) is 24 bytes. So is there any scala expert we can ping? I really want to know why Tuple2(1, 2) can be 24 bytes when the specialized version is involved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6286][Mesos][minor] Handle missing Meso...
GitHub user jongyoul opened a pull request: https://github.com/apache/spark/pull/5088 [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend - This is related #5000 You can merge this pull request into a Git repository by running: $ git pull https://github.com/jongyoul/spark SPARK-6286-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5088.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5088 commit ac4336ae5988598a9ed663588606c410dd154480 Author: Jongyoul Lee jongy...@gmail.com Date: 2015-03-19T09:13:28Z [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4610#issuecomment-83433562 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28860/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6286][Mesos][minor] Handle missing Meso...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5088#issuecomment-83440771 [Test build #28865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28865/consoleFull) for PR 5088 at commit [`4f2362f`](https://github.com/apache/spark/commit/4f2362f55009688fae168ff22c0f5dfee22abda1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core] Specify port range in spark
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3314#issuecomment-83440739 [Test build #28866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28866/consoleFull) for PR 3314 at commit [`fa5bcbb`](https://github.com/apache/spark/commit/fa5bcbbb4215bef56006bbe0d1081a2c237b8b72). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4610#issuecomment-83399347 [Test build #28860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28860/consoleFull) for PR 4610 at commit [`c387fce`](https://github.com/apache/spark/commit/c387fcef43ed45bc6469902216c57b9937ae5a1d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6408] [SQL] Fix JDBCRDD filtering strin...
GitHub user ypcat opened a pull request: https://github.com/apache/spark/pull/5087 [SPARK-6408] [SQL] Fix JDBCRDD filtering string literals You can merge this pull request into a Git repository by running: $ git pull https://github.com/ypcat/spark spark-6408 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5087.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5087 commit 896253457361c75ec8678950d9549ed4187f895b Author: Pei-Lun Lee pl...@appier.com Date: 2015-03-19T08:20:51Z [SPARK-6408] [SQL] Fix filtering string literals --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...
Github user haiyangsea commented on the pull request: https://github.com/apache/spark/pull/5082#issuecomment-83411727 It looks like a greate feature! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/4055#issuecomment-83422662 This patch is forgotten by us... @srowen @markhamstra @kayousterhout this patch can prevent from endless retry which may occurs after a executor is killed or lost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org