[GitHub] spark pull request: [Minor] rat exclude dependency-reduced-pom.xml
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/2326#issuecomment-55064478 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55094784 @allwefantasy æçæµè¯è¯æåºå¤§å°æ¯`196558` 个ææ¡£, `7897767` 个è¯. è¿ä»£æ¬¡æ°æ¯`100`次. ä½ ç24ä¸ææ¡£æ»å ±æå¤ä¸ªè¯? ä½ å¯ä»¥è´´åº stage çè¿è¡æªå¾,url ä¸è¬æ¯ stages/stage?id=226 ä¹ç±»ç. å¦ä¸çæªå¾: ![qq20140910-1](https://cloud.githubusercontent.com/assets/302879/4215649/df5ead34-38d0-11e4-868a-54553dc4f910.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55095890 @allwefantasy æ认为è¿éç代ç ` Document(parts(0).toInt,(0 until wordInfo.value.size).map(k= values.getOrElse(k,0)).toArray)` æ¯æç¹é®é¢ç.. åºè¯¥æ¯è¿æ ·å¤çç. æ¯ä¸ªè¯å¯¹åºä¸ä¸ªæ´æ°, ä¾å¦ a = 1, b=2, c= 3, d= 4 ... z = 27 å¦æä½ çææ¡£åè¯çåºåæ¯ a, b,d,a è¿ä¸ä¸ªè¯ Documentå®ä¾æ¯è¿ä¹å建ç.`Document(1,Array(1,2,4,1))`. Document conten æ¬è´¨ä¸æ¯ææ¡£è¯çåè¯äº§ççç»æ. åªæ¯æ a,b,d,a è¿æ ·çå符串æ°ç»ç¨æ´æ°æ°ç»è¡¨ç¤º. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55096559 @srowen I will try to translate the comments into English --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3470 [CORE] [STREAMING] Add Closeable / ...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/2346#issuecomment-55132817 The relevant PR: #991 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1330#issuecomment-55210927 The code has been updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1330#discussion_r17401380 --- Diff: pom.xml --- @@ -839,7 +839,6 @@ arg-unchecked/arg arg-deprecation/arg arg-feature/arg - arg-language:postfixOps/arg --- End diff -- @jkbradley I removed this parameter. The related discussion in #1069 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55223673 @allwefantasy Sparkæ¯å¯ä»¥è°æ´executoråæ¶è¿è¡çtaskæ°éç. å¦æä½ æ³è®©æ¯ä¸ªexecutoråæ¶å¯ä»¥è¿è¡17个task. å¯ä»¥å¨`conf/spark-defaults.conf` æ件添å å¦ä¸é ç½® spark.executor.cores 17 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1330#issuecomment-55236678 No postfix warnings in 179ba61 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-55280269 @allwefantasy ç°æç代ç å¨è¿ä»£è®¡ç®è¿ç¨ä¸å建äºå¤ªå¤çTopicModelå®ä¾, æç°å¨æ£å¨å°è¯è§£å³è¿ä¸ªé®é¢. æè°¢ä½ çåé¦. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2491] Don't handle uncaught exceptions ...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1482#issuecomment-61363413 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark shell class path is not correctly set if...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3050 Spark shell class path is not correctly set if spark.driver.extraClassPath is set in defaults.conf You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-4161 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3050.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3050 commit 38890abc5b87222e25998788a35f29d994f08050 Author: GuoQiang Li wi...@qq.com Date: 2014-11-01T17:30:11Z Spark shell class path is not correctly set if spark.driver.extraClassPath is set in defaults.conf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4161]Spark shell class path is not corr...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3051 [SPARK-4161]Spark shell class path is not correctly set if spark.driver.extraClassP... ...ath is set in defaults.conf.(branch-1.1 backport) You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-4161_1.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3051.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3051 commit 44bad33cf8bd3da2445c606985090041b3154b7b Author: GuoQiang Li wi...@qq.com Date: 2014-11-01T17:32:56Z Spark shell class path is not correctly set if spark.driver.extraClassPath is set in defaults.conf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Minor bug fixes in bin/run-example
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3069 [Minor] Minor bug fixes in bin/run-example `./sbt/sbt clean assembly` = `examples/target/scala-2.10/spark-examples_2-10-1.2.0-SNAPSHOT-hadoop1.0.4.jar` You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark run-example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3069.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3069 commit e13fab79974eca197daee72b21bea57dccb3d8fb Author: GuoQiang Li wi...@qq.com Date: 2014-11-03T06:47:25Z Small bug fixes in bin/run-example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Minor bug fixes in bin/run-example
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3069#issuecomment-61449081 `mvn package` generated file is like this `spark-examples-1.2.0-SNAPSHOT-*` . `./sbt/sbt clean assembly` generated file is like this `spark-examples_2-10-1.2.0-SNAPSHOT-* `( Here more such strings:`_2-10`). `spark-examples-*hadoop*.jar` only matching `spark-examples-1.2.0-SNAPSHOT-*` . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Minor bug fixes in bin/run-example
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3069#issuecomment-61463811 The current solution is simple to implement, and other source also used this solution. eg: [compute-classpath.cmd#L52](https://github.com/apache/spark/blob/master/bin/compute-classpath.cmd#L52) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3797] Run external shuffle service in Y...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/3082#discussion_r19935139 --- Diff: make-distribution.sh --- @@ -181,6 +181,9 @@ echo Spark $VERSION$GITREVSTRING built for Hadoop $SPARK_HADOOP_VERSION $DI # Copy jars cp $FWDIR/assembly/target/scala*/*assembly*hadoop*.jar $DISTDIR/lib/ cp $FWDIR/examples/target/scala*/spark-examples*.jar $DISTDIR/lib/ +cp $FWDIR/network/yarn/target/scala*/spark-network-yarn*.jar $DISTDIR/lib/ --- End diff -- @andrewor14 Here is a problem: I use this command line: `./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.1 -Dyarn.version=2.3.0-cdh5.0.1 -Phadoop-2.3 -Pyarn -Pnetlib-lgpl` , but `$FWDIR/network/yarn/target/scala*/spark-network-yarn*.jar` does not exist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-62085010 We should use matrix to calculate the forward propagation ,back propagation see http://deeplearning.stanford.edu/wiki/index.php/Neural_Network_Vectorization --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-62089562 We can not use existing Gradient classes,Let the whole iterative process is completed in the form of matrix calculation.Moreover We can use the als algorithm design, cut the matrix into the appropriate pieces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-62254979 I agree with what @debasish83 said. We should find a suitable solution to weight matrix distributed storage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][MLLIB]Add Restricted Boltzma...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3222 [WIP][SPARK-4251][MLLIB]Add Restricted Boltzmann machine(RBM) algorithm to MLlib You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark rbm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3222 commit 8ced3e8784c75dbb0c874fe207db61aa1f8e6e7b Author: GuoQiang Li wi...@qq.com Date: 2014-11-12T09:15:09Z Add Restricted Boltzmann machine(RBM) algorithm to MLlib --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Support cross building for Scala 2.11
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3159#issuecomment-62700413 @pwendell @ScrapCodes This patch has a bug: `./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.1 -Dyarn.version=2.3.0-cdh5.0.1 -Phadoop-2.3 -Pyarn` `./bin/spark-shell`= ``` java.lang.ClassNotFoundException: org.apache.spark.repl.Main at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:337) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3228 [HOTFIX]: Fix maven build missing some class The bug was caused by #3159 You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark hotfix_repl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3228.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3228 commit 7ea18a70cc645f0b39a97119d3860c477f4e987c Author: GuoQiang Li wi...@qq.com Date: 2014-11-12T12:58:45Z HOTFIX: Fix maven build missing some class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3228#issuecomment-62714530 cc @pwendell @ScrapCodes @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3228#issuecomment-62849650 How about the following? ```xml profile idscala-2.10/id activation property namescala.version/name value2.10.4/value /property /activation properties scala.binary.version2.10/scala.binary.version jline.version${scala.version}/jline.version jline.groupidorg.scala-lang/jline.groupid /properties modules moduleexternal/kafka/module /modules /profile ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3228#issuecomment-62851150 Yes, it seems to work.It seems that the user must explicitly set `scala.version`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/3228 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-63030582 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4422][MLLIB]In some cases, Vectors.from...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3281 [SPARK-4422][MLLIB]In some cases, Vectors.fromBreeze get wrong results. cc @mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-4422 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3281.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3281 commit 7a10123aa35c8558f4913eb5d2b56a84d46f6e82 Author: GuoQiang Li wi...@qq.com Date: 2014-11-15T06:27:42Z In some cases, Vectors.fromBreeze get wrong results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-63172769 Sorry, This patch is still work in process., I will add the annotation and document at later. BTW, My English is poor. we can communicate in email,This is more efficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Bumping version to 1.3.0-SNAPSHOT.
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3277#issuecomment-63203941 [package.scala#L47](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/package.scala#L47) should be modified --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-63222980 Now, neural net model is stored in a matrix. The model is able to support 1000 * 500 * 100 three-layer neural network and 10*1000 two-layer neural network. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/3222#discussion_r20410641 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/neuralNetwork/DBN.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.neuralNetwork + +import org.apache.spark.Logging +import org.apache.spark.mllib.linalg.{Vector = SV} +import org.apache.spark.rdd.RDD + +class DBN(val stackedRBM: StackedRBM, val nn: NN) + extends Logging with Serializable { +} + +object DBN extends Logging { + def train( +data: RDD[(SV, SV)], +batchSize: Int, +numIteration: Int, +topology: Array[Int], +fraction: Double, +momentum: Double, +weightCost: Double, +learningRate: Double): DBN = { +val dbn = initializeDBN(topology) +pretrain(data, batchSize, numIteration, dbn, + fraction, momentum, weightCost, learningRate) +NN.train(data, batchSize, numIteration, dbn.nn, + fraction, momentum, weightCost, learningRate) +dbn + } + + private[mllib] def pretrain( +data: RDD[(SV, SV)], +batchSize: Int, +numIteration: Int, +dbn: DBN, +fraction: Double, +momentum: Double, +weightCost: Double, +learningRate: Double): DBN = { +StackedRBM.train(data.map(_._1), batchSize, numIteration, dbn.stackedRBM, + fraction, momentum, weightCost, learningRate, dbn.stackedRBM.numLayer - 1) --- End diff -- The last layer should be also trained. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/3222#discussion_r20575084 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/neuralNetwork/DBN.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.neuralNetwork + +import org.apache.spark.Logging +import org.apache.spark.mllib.linalg.{Vector = SV} +import org.apache.spark.rdd.RDD + +class DBN(val stackedRBM: StackedRBM, val nn: NN) + extends Logging with Serializable { +} + +object DBN extends Logging { + def train( +data: RDD[(SV, SV)], +batchSize: Int, +numIteration: Int, +topology: Array[Int], +fraction: Double, +momentum: Double, +weightCost: Double, +learningRate: Double): DBN = { +val dbn = initializeDBN(topology) +pretrain(data, batchSize, numIteration, dbn, + fraction, momentum, weightCost, learningRate) +NN.train(data, batchSize, numIteration, dbn.nn, + fraction, momentum, weightCost, learningRate) +dbn + } + + private[mllib] def pretrain( +data: RDD[(SV, SV)], +batchSize: Int, +numIteration: Int, +dbn: DBN, +fraction: Double, +momentum: Double, +weightCost: Double, +learningRate: Double): DBN = { +StackedRBM.train(data.map(_._1), batchSize, numIteration, dbn.stackedRBM, + fraction, momentum, weightCost, learningRate, dbn.stackedRBM.numLayer - 1) --- End diff -- I see, Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1290#discussion_r20589805 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala --- @@ -0,0 +1,528 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.ann + +import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy} + +import org.apache.spark.mllib.linalg.{Vector, Vectors} +import org.apache.spark.mllib.optimization._ +import org.apache.spark.rdd.RDD +import org.apache.spark.util.random.XORShiftRandom + +/* + * Implements a Artificial Neural Network (ANN) + * + * The data consists of an input vector and an output vector, combined into a single vector + * as follows: + * + * [ ---input--- ---output--- ] + * + * NOTE: output values should be in the range [0,1] + * + * For a network of H hidden layers: + * + * hiddenLayersTopology(h) indicates the number of nodes in hidden layer h, excluding the bias + * node. h counts from 0 (first hidden layer, taking inputs from input layer) to H - 1 (last + * hidden layer, sending outputs to the output layer). + * + * hiddenLayersTopology is converted internally to topology, which adds the number of nodes + * in the input and output layers. + * + * noInput = topology(0), the number of input nodes + * noOutput = topology(L-1), the number of output nodes + * + * input = data( 0 to noInput-1 ) + * output = data( noInput to noInput + noOutput - 1 ) + * + * W_ijl is the weight from node i in layer l-1 to node j in layer l + * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the weights vector + * + * B_jl is the bias input of node j in layer l + * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + topology(l-1) in the weights vector + * + * error function: E( O, Y ) = sum( O_j - Y_j ) + * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN, + * and (Y_0, ..., Y_(noOutput-1)) the input) + * + * node_jl is node j in layer l + * node_jl goes to position ofsNode(l) + j + * + * The weights gradient is defined as dE/dW_ijl and dE/dB_jl + * It has same mapping as W_ijl and B_jl + * + * For back propagation: + * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before applying the sigmoid + * delta_jl has the same mapping as node_jl + * + * Where E = ((estOutput-output),(estOutput-output)), + * the inner product of the difference between estimation and target output with itself. + * + */ + +/** + * Artificial neural network (ANN) model + * + * @param weights the weights between the neurons in the ANN. + * @param topology array containing the number of nodes per layer in the network, including + * the nodes in the input and output layer, but excluding the bias nodes. + */ +class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val topology: Array[Int]) + extends Serializable with ANNHelper { + + /** + * Predicts values for a single data point using the trained model. + * + * @param testData represents a single data point. + * @return prediction using the trained model. + */ + def predict(testData: Vector): Vector = { +Vectors.dense(computeValues(testData.toArray, weights.toArray)) + } + + /** + * Predict values for an RDD of data points using the trained model. + * + * @param testDataRDD RDD representing the input vectors. + * @return RDD with predictions using the trained model as (input, output) pairs. + */ + def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = { +testDataRDD.map(T = (T, predict(T)) ) + } + + private def computeValues(arrData: Array[Double], arrWeights: Array[Double]): Array[Double] = { +val arrNodes = forwardRun(arrData, arrWeights) +arrNodes.slice(arrNodes.size - topology(L
[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3399 [SPARK-4526][MLLIB]GradientDescent get a wrong gradient value according to the gradient formula. This is caused by the miniBatchSize parameter.The number of `RDD.sample` returns is not fixed. cc @mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark GradientDescent Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3399.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3399 commit 606b27a1a6c1e5a1e4c51d01d1f6da9f6ed31524 Author: GuoQiang Li wi...@qq.com Date: 2014-11-21T06:34:50Z GradientDescent get a wrong gradient value according to the gradient formula, which is caused by the miniBatchSize parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3399#issuecomment-63934659 AmplabJenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3399#issuecomment-63941966 @mengxr I'm not sure. In my test of #3222, The convergence rate of SGD less than expected. it should be affected by this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/3399#discussion_r20754059 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -185,25 +184,29 @@ object GradientDescent extends Logging { val bcWeights = data.context.broadcast(weights) // Sample a subset (fraction miniBatchFraction) of the total data // compute and sum up the subgradients on this subset (this is one map-reduce) - val (gradientSum, lossSum) = data.sample(false, miniBatchFraction, 42 + i) -.treeAggregate((BDV.zeros[Double](n), 0.0))( - seqOp = (c, v) = (c, v) match { case ((grad, loss), (label, features)) = -val l = gradient.compute(features, label, bcWeights.value, Vectors.fromBreeze(grad)) -(grad, loss + l) + val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) +.treeAggregate((BDV.zeros[Double](n), 0.0, 0.0))( --- End diff -- Yes, It should use a long variable . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/3222#discussion_r20754174 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/neuralNetwork/StackedRBM.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.neuralNetwork + +import java.util.Random + +import scala.collection.JavaConversions._ + +import breeze.linalg.{DenseVector = BDV, DenseMatrix = BDM, sum = brzSum} +import breeze.numerics.{sigmoid = brzSigmoid} + +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.Logging +import org.apache.spark.mllib.linalg.{Vector = SV, DenseVector = SDV} +import org.apache.spark.mllib.linalg.Vectors +import org.apache.spark.util.random.XORShiftRandom +import org.apache.spark.rdd.RDD + +class StackedRBM(val innerRBMs: Array[RBM]) + extends Logging with Serializable { + def this(topology: Array[Int]) { +this(StackedRBM.initializeRBMs(topology)) + } + + def numLayer = innerRBMs.length + + def numInput = innerRBMs.head.numVisible + + def numOut = innerRBMs.last.numHidden + + def activateHidden(visible: BDM[Double], toLayer: Int): BDM[Double] = { +var x = visible +for (layer - 0 until toLayer) { + x = innerRBMs(layer).activateHidden(x) + // x = innerRBMs(layer).bernoulli(x) --- End diff -- There needs to be converted to binary vector? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4530][MLLIB]GradientDescent get a wrong...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3399#issuecomment-64381732 @mengxr The title has been updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Improved build configuration
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/480#discussion_r11933015 --- Diff: pom.xml --- @@ -506,7 +508,45 @@ dependency groupIdorg.apache.avro/groupId artifactIdavro/artifactId -version1.7.4/version +version${avro.version}/version +exclusions + exclusion +groupIdorg.jboss.netty/groupId +artifactIdnetty/artifactId + /exclusion + exclusion +groupIdio.netty/groupId +artifactIdnetty/artifactId + /exclusion +/exclusions + /dependency + dependency +groupIdorg.apache.avro/groupId +artifactIdavro-ipc/artifactId --- End diff -- spark-hive dependency: ``` [INFO] +- org.apache.hive:hive-serde:jar:0.12.0:compile [INFO] | +- org.apache.hive:hive-common:jar:0.12.0:compile [INFO] | | +- org.apache.hive:hive-shims:jar:0.12.0:compile [INFO] | | | \- commons-logging:commons-logging-api:jar:1.0.4:compile [INFO] | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | | \- org.tukaani:xz:jar:1.0:compile [INFO] | +- org.mockito:mockito-all:jar:1.8.5:test (version managed from 1.8.2; scope managed from compile) [INFO] | +- org.apache.thrift:libfb303:jar:0.9.0:compile [INFO] | | \- org.apache.thrift:libthrift:jar:0.9.0:compile [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.3:compile [INFO] | | \- org.apache.httpcomponents:httpcore:jar:4.1.3:compile [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | +- org.apache.avro:avro:jar:1.7.4:compile (version managed from 1.7.1) [INFO] | | \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | \- org.apache.avro:avro-mapred:jar:1.7.1:compile [INFO] | \- org.apache.avro:avro-ipc:jar:1.7.1:compile [INFO] |+- org.mortbay.jetty:jetty:jar:6.1.26:compile [INFO] |+- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] |+- org.apache.velocity:velocity:jar:1.7:compile [INFO] |\- org.mortbay.jetty:servlet-api:jar:2.5-20081211:compile ``` spark-streaming-flume dependency: ``` [INFO] +- org.apache.flume:flume-ng-sdk:jar:1.2.0:compile [INFO] | +- org.apache.avro:avro:jar:1.7.4:compile [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile [INFO] | | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | | \- org.tukaani:xz:jar:1.0:compile [INFO] | +- org.apache.avro:avro-ipc:jar:1.6.3:compile [INFO] | | +- org.mortbay.jetty:jetty:jar:6.1.26:compile [INFO] | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | | \- org.apache.velocity:velocity:jar:1.7:compile [INFO] | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | \- commons-lang:commons-lang:jar:2.4:compile ``` inconsistent versions dependency --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improved build configuration
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/480#discussion_r11933105 --- Diff: pom.xml --- @@ -793,6 +833,17 @@ /build profiles +!-- SPARK-1121: Adds an explicit dependency on Avro to work around a Hadoop 0.23.X issue -- +profile + idhadoop-0.23/id --- End diff -- I have not found this problem in the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1119 and other build improvements
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/502#issuecomment-41237973 @berngp @pwendell , Whether we can delete the `yarn.version`, only using `hadoop.version`. This will cause any problems? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1119 and other build improvements
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/502#issuecomment-41239453 @berngp Most of the people uses the same version of HDFS vs YARN. We can be so ```xml hadoop.version1.0.4/hadoop.version yarn.version${hadoop.version}/yarn.version ``` ```xml profile idyarn-alpha/id properties hadoop.major.version2/hadoop.major.version hadoop.version0.23.7/hadoop.version /properties modules moduleyarn/module /modules dependencyManagement dependencies dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-yarn-api/artifactId version${yarn.version}/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-yarn-common/artifactId version${yarn.version}/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-yarn-client/artifactId version${yarn.version}/version /dependency /dependencies /dependencyManagement /profile ``` Most of the people use `mvn -Pyarn -Dhadoop.version=2.3.0 -DskipTests clean package`. Others use`mvn -Pyarn -Dhadoop.version=2.3.0 -DskipTests -Dyarn.version=0.23.9 clean package`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix SPARK-1609: Executor fails to start when C...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/547 Fix SPARK-1609: Executor fails to start when Command.extraJavaOptions contains multiple Java options You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1609 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/547.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #547 commit 8a265b7f1084e8d211833dc31633f1f2a16195c6 Author: witgo wi...@qq.com Date: 2014-04-24T17:12:10Z Fix SPARK-1609: Executor fails to start when use spark-submit commit 86fc4bbae56f937e88595a10a01b3db7770e460b Author: witgo wi...@qq.com Date: 2014-04-25T02:51:54Z bugfix commit f7c0ab71ceef4023ec2f63f65d0a7d346e989fa0 Author: witgo wi...@qq.com Date: 2014-04-25T02:55:01Z bugfix commit 1185605f34457767909259f83a8e44be7456d7fe Author: witgo wi...@qq.com Date: 2014-04-25T03:00:29Z fix extraJavaOptions split commit bcf36cb8946fa67c77af8e6ac813808bbb538be0 Author: witgo wi...@qq.com Date: 2014-04-25T03:04:49Z Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Modify spark.ui.killEnabled default is false
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/510 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix SPARK-1609: Executor fails to start when C...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/547#discussion_r12023196 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala --- @@ -48,7 +48,13 @@ object CommandUtils extends Logging { def buildJavaOpts(command: Command, memory: Int, sparkHome: String): Seq[String] = { val memoryOpts = Seq(s-Xms${memory}M, s-Xmx${memory}M) // Note, this will coalesce multiple options into a single command component -val extraOpts = command.extraJavaOptions.toSeq +val extraOpts = command.extraJavaOptions match { --- End diff -- Yes, `val extraOpts = command.extraJavaOptions.map(Utils.splitCommandString).getOrElse(Seq())` is better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/569 Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_... ...OS_WINDOWS` You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1629 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/569.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #569 commit 49e248e7a055c50586bf1f4170c5404566adba23 Author: witgo wi...@qq.com Date: 2014-04-27T02:44:29Z Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_OS_WINDOWS` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/569#discussion_r12027354 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1056,4 +1055,11 @@ private[spark] object Utils extends Logging { def getHadoopFileSystem(path: String): FileSystem = { getHadoopFileSystem(new URI(path)) } + + /** + * return true if this is Windows. + */ + def isWindows = Option(System.getProperty(os.name)). --- End diff -- @srowen ```scala def isWindows(): Boolean = { try { val osName = System.getProperty(os.name) osName != null osName.startsWith(Windows) } catch { case e: SecurityException = (log a warning and return false) } } You think here will be thrown a SecurityException .Why? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: improvements spark-submit usage
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/581 improvements spark-submit usage You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1659 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/581.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #581 commit 0b2cf9856ae68f37c3d2228f7a0d57c3414d760e Author: witgo wi...@qq.com Date: 2014-04-28T16:55:44Z Delete spark-submit obsolete usage: --arg ARG --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/423#discussion_r12080480 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] extends Serializable { rdd.zipPartitions(other.rdd)(fn)(other.classTag, fakeClassTag[V]))(fakeClassTag[V]) } + /** + * Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k, + * 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method + * won't trigger a spark job, which is different from [[org.apache.spark.rdd.RDD#zipWithIndex]]. + */ + def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = { --- End diff -- When remove the `[Long]`. The type of return value is JavaPairRDDInteger,Object --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/423#discussion_r12081268 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] extends Serializable { rdd.zipPartitions(other.rdd)(fn)(other.classTag, fakeClassTag[V]))(fakeClassTag[V]) } + /** + * Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k, + * 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method + * won't trigger a spark job, which is different from [[org.apache.spark.rdd.RDD#zipWithIndex]]. + */ + def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = { --- End diff -- Yes,in my test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improved build configuration
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/480#issuecomment-41646101 Cool! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/423#discussion_r12081885 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] extends Serializable { rdd.zipPartitions(other.rdd)(fn)(other.classTag, fakeClassTag[V]))(fakeClassTag[V]) } + /** + * Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k, + * 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method + * won't trigger a spark job, which is different from [[org.apache.spark.rdd.RDD#zipWithIndex]]. + */ + def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = { --- End diff -- ```scala def zipWithUniqueId(): JavaPairRDD[T, JLong] = { JavaPairRDD.fromRDD(rdd.zipWithUniqueId()).asInstanceOf[JavaPairRDD[T, JLong]] } ``` is better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/423#discussion_r12082137 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] extends Serializable { rdd.zipPartitions(other.rdd)(fn)(other.classTag, fakeClassTag[V]))(fakeClassTag[V]) } + /** + * Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k, + * 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method + * won't trigger a spark job, which is different from [[org.apache.spark.rdd.RDD#zipWithIndex]]. + */ + def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = { --- End diff -- @rxin You're right, has been modified. @mengxr ```scala def zipWithUniqueId(): JavaPairRDD[T, java.lang.Long] = { JavaPairRDD.fromRDD(rdd.zipWithUniqueId().map(x = (x._1, new java.lang.Long(x._2))) ```create too many objects. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improved build configuration � �
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/590 Improved build configuration â ¡ @berngp I merge your code to this PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark improved_build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/590.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #590 commit 4e96c0153063b35fc03e497f28292a97832e81d4 Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com Date: 2014-04-15T21:03:30Z Add YARN/Stable compiled classes to the CLASSPATH. The change adds the `./yarn/stable/target/scala-version/classes` to the _Classpath_ when a _dependencies_ assembly is available at the assembly directory. Why is this change necessary? Ease the development features and bug-fixes for Spark-YARN. [ticket: X] : NA Author : bernardo.gomezpala...@gmail.com Reviewer: ? Testing : ? commit 1342886a396be00eda9449c6d84155dfecf954c8 Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com Date: 2014-04-15T21:46:44Z The `spark-class` shell now ignores non jar files in the assembly directory. Why is this change necessary? While developing in Spark I found myself rebuilding either the dependencies assembly or the full spark assembly. I kept running into the case of having both the dep-assembly and full-assembly in the same directory and getting an error when I called either `spark-shell` or `spark-submit`. Quick fix: move either of them as a .bkp file depending on the development work flow you are executing at the moment and enabling the `spark-class` to ignore non-jar files. An other option could be to move the offending jar to a different directory but in my opinion keeping them in there is a bit tidier. e.g. ``` ll ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp ``` [ticket: X] : ? commit ddf2547aa2aea8155f8d6c0386e2cb37bcf61537 Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com Date: 2014-04-15T21:53:23Z The `spark-shell` option `--log-conf` also enables the SPARK_PRINT_LAUNCH_COMMAND . Why is this change necessary? Most likely when enabling the `--log-conf` through the `spark-shell` you are also interested on the full invocation of the java command including the _classpath_ and extended options. e.g. ``` INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark INFO: Spark Master is yarn-client INFO: Spark REPL options -Dspark.logConf=true Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp :/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berng p/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log -XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof -XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation -Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC -Dspark.cleaner.ttl=1 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true -Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main ``` [ticket: X] : ? commit 22045394955992c2c8dfe0e1040c6bb972be6ce4 Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com Date: 2014-04-15T22:15:23Z Root is now Spark and qualify the assembly if it was built with YARN. Why is this change necessary? Renamed the SBT root project to spark to enhance readability. Currently the assembly is qualified with the Hadoop Version
[GitHub] spark pull request: [WIP] Improved build configuration III
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/598 [WIP] Improved build configuration III You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark sql-pom Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/598.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #598 commit 3d175194f27c86a605a9b65bbef2e51a551178e7 Author: witgo wi...@qq.com Date: 2014-04-30T08:32:23Z Improved build configuration III --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] Improved build configuration III
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/598#issuecomment-41810943 @pwendell Now, I have a very radical idea, removing the support sbt. What problems will it have? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improved build configuration � �
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/590#discussion_r12181447 --- Diff: project/SparkBuild.scala --- @@ -55,7 +55,7 @@ object SparkBuild extends Build { val SCALAC_JVM_VERSION = jvm-1.6 val JAVAC_JVM_VERSION = 1.6 - lazy val root = Project(root, file(.), settings = rootSettings) aggregate(allProjects: _*) + lazy val root = Project(spark, file(.), settings = rootSettings) aggregate(allProjects: _*) --- End diff -- Just to increase readability. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improved build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-41890856 @pwendell Have removed travis changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1681] Include datanucleus jars in Spark...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/610#issuecomment-41911017 There is another solution #598 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1695: java8-tests compiler error: packag...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/611 SPARK-1695: java8-tests compiler error: package com.google.common.co... ...llections does not exist You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1695 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/611.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #611 commit d77a8875f30d460bdd5e301e30beb88d11fa5138 Author: witgo wi...@qq.com Date: 2014-05-01T16:03:08Z Fix SPARK-1695: java8-tests compiler error: package com.google.common.collections does not exist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improved build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42094359 @tgravescs I tested many times, These are all can pass. `mvn clean package -DskipTests -Pyarn-alpha -Dhadoop.version=0.23.7 -Phadoop-0.23` `mvn clean package -DskipTests -Pyarn-alpha -Dhadoop.version=0.23.9 -Phadoop-0.23` `mvn clean package -DskipTests -Pyarn-alpha -Dhadoop.version=0.23.7 -Phadoop-0.23 -Dyarn.version=0.23.10` The code is not the latest? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1699: Python relative independence from ...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/624 SPARK-1699: Python relative independence from the core, becomes subprojects You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark python-api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/624.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #624 commit d9a31db82b30ebfa6c27227507e2e20bb1e8d08a Author: witgo wi...@qq.com Date: 2014-05-03T06:20:52Z SPARK-1699: Python relative should be independence from the core, becomes subprojects --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improved build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42098647 @pwendell How about [this solution](https://github.com/witgo/spark/commit/0ed124dc0e453a0a59d3c387651be970859a9a0a)? Only exclusion the servlet-api 2.5 dependency --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add yarn.version for profile yarn and yarn-alp...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/625#issuecomment-42102176 [The PR 590](https://github.com/apache/spark/pull/590) contains relevant changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42102979 Hi @pwendell ,@srowen All the change is very small,and [this solution](https://github.com/witgo/spark/commit/0ed124dc0e453a0a59d3c387651be970859a9a0a) only work with hadoop2.3.x, 2.4.x can be merged into 1.0 Your views? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: The default version of yarn is equal to the ha...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/626 The default version of yarn is equal to the hadoop version This is a part of [PR 590](https://github.com/apache/spark/pull/590) You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark yarn_version Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/626.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #626 commit c76763b875beedba0a144efe1d3b814cfc8b811b Author: witgo wi...@qq.com Date: 2014-05-03T13:57:09Z The default value of yarn.version is equal to hadoop.version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-1699: Python relative independence...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/624#issuecomment-42107623 Branch is wrong, temporarily closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-1699: Python relative independence...
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/624 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/468#issuecomment-42109604 @srowen Not every one uses the same version of HDFS vs YARN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/468#issuecomment-42110042 @srowen Related discussion in [PR 502](https://github.com/apache/spark/pull/502). @berngp Can you explain the reason of not using the same version of HDFS vs YARN ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Improve build configuration � �
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/590#issuecomment-42120300 @pwendell I did not notice here, has been modified --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: The default version of yarn is equal to the ha...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/626#discussion_r12259307 --- Diff: pom.xml --- @@ -558,65 +560,8 @@ artifactIdjets3t/artifactId version0.7.1/version /dependency - dependency --- End diff -- You're right. but in `mvn -Pyarn clean package`, the hadoop version 2.2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: The default version of yarn is equal to the ha...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/626#discussion_r12259340 --- Diff: pom.xml --- @@ -558,65 +560,8 @@ artifactIdjets3t/artifactId version0.7.1/version /dependency - dependency --- End diff -- |maven| hadoop.version | yarn.version | | : |:---:|:-:| |`mvn -Pyarn -DskipTests clean package`|2.2.0|2.2.0| |`mvn -Phadoop-0.23 -Pyarn-alpha -DskipTests clean package`|0.23.7|0.23.7 |`mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package`| 2.0.0-cdh4.2.0|2.0.0-cdh4.2.0 |`mvn -Phadoop-0.23 -Pyarn-alpha -Dhadoop.version=2.3.0 -Dyarn.version=0.23.7 -DskipTests clean package`|2.3.0|0.23.7| |`mvn -DskipTests clean package`|1.0.4|not support| |`mvn -Pyarn-alpha -Dyarn.version=0.23.7 -Dhadoop.version=1.0.4 -Phadoop-0.23 -DskipTests package`|1.0.4|0.23.7| --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: The default version of yarn is equal to the ha...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/626#discussion_r12259611 --- Diff: pom.xml --- @@ -558,65 +560,8 @@ artifactIdjets3t/artifactId version0.7.1/version /dependency - dependency --- End diff -- In `mvn -DskipTests clean package` the dependency declarations of `hadoop-yarn-api`,`hadoop-yarn-common`,`hadoop-yarn-client` is no necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: The default version of yarn is equal to the ha...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/626#discussion_r12259715 --- Diff: pom.xml --- @@ -558,65 +560,8 @@ artifactIdjets3t/artifactId version0.7.1/version /dependency - dependency --- End diff -- When `hadop.version` is 1.0.4 , `yarn.version` is also 1.0.4 dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-yarn-client/artifactId version${yarn.version}/version /dependency is not correct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1693: Most of the tests throw a java.lan...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/628 SPARK-1693: Most of the tests throw a java.lang.SecurityException when s... ...park built for hadoop 2.3.0 , 2.4.0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1693_new Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #628 commit dc63905908cb7c84c741bb5fdc4ad7d4abdcb0b2 Author: witgo wi...@qq.com Date: 2014-05-04T06:43:43Z SPARK-1693: Most of the tests throw a java.lang.SecurityException when spark built for hadoop 2.3.0 , 2.4.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1556. jets3t dep doesn't update properly...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/629#discussion_r12261160 --- Diff: core/pom.xml --- @@ -38,12 +38,6 @@ dependency groupIdnet.java.dev.jets3t/groupId artifactIdjets3t/artifactId - exclusions -exclusion - groupIdcommons-logging/groupId - artifactIdcommons-logging/artifactId -/exclusion - /exclusions --- End diff -- Why remove it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1556. jets3t dep doesn't update properly...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/629#issuecomment-42132419 Looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1699: Python relative independent, becom...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/631 SPARK-1699: Python relative independent, becomes a subproject You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1699 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #631 commit 74ffefb453dd14f49d88c7a7b8a406b82f325c56 Author: witgo wi...@qq.com Date: 2014-05-04T16:05:44Z SPARK-1699: Python relative independence from the core, becomes subprojects --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add missing description to spark-env.sh.templa...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/646 Add missing description to spark-env.sh.template You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark spark_env Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/646.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #646 commit 9a95a564593ad4071486abb51750cbf6c9b921ff Author: witgo wi...@qq.com Date: 2014-05-05T10:25:04Z Add missing description to spark-env.sh.template --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1734: spark-submit throws an exception: ...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/665 SPARK-1734: spark-submit throws an exception: Exception in thread main... ... java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1734 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/665.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #665 commit cacf23852027fb3d0fb5a020f1e9216bba0468d3 Author: witgo wi...@qq.com Date: 2014-05-06T09:13:49Z SPARK-1734: spark-submit throws an exception: Exception in thread main java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1699: Python relative independent, becom...
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/631 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/677#discussion_r12364638 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -414,6 +415,14 @@ private[spark] class TaskSetManager( // we assume the task can be serialized without exceptions. val serializedTask = Task.serializeWithDependencies( task, sched.sc.addedFiles, sched.sc.addedJars, ser) + if (serializedTask.limit = akkaFrameSize - 1024) { --- End diff -- `serializedTask` = `4356 bytes` `LaunchTask` = `4797 bytes` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/677#discussion_r12363925 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -414,6 +415,14 @@ private[spark] class TaskSetManager( // we assume the task can be serialized without exceptions. val serializedTask = Task.serializeWithDependencies( task, sched.sc.addedFiles, sched.sc.addedJars, ser) + if (serializedTask.limit = akkaFrameSize - 1024) { --- End diff -- The reference [Executor.scala#L235](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L235).may not fit here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: improve the readability of SparkContext.scala
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/414 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/677#discussion_r12364078 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -414,6 +415,14 @@ private[spark] class TaskSetManager( // we assume the task can be serialized without exceptions. val serializedTask = Task.serializeWithDependencies( task, sched.sc.addedFiles, sched.sc.addedJars, ser) + if (serializedTask.limit = akkaFrameSize - 1024) { +val msg = Serialized task %s:%d were %d bytes which + --- End diff -- The reason for this is to keep the same style of the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/677 SPARK-1712: TaskDescription instance is too big causes Spark to hang You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1712 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #677 commit e6578400ce58104d2b022f62110ac83f82a92872 Author: witgo wi...@qq.com Date: 2014-05-07T05:12:34Z SPARK-1712: TaskDescription instance is too big causes Spark to hang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] update scalatest to version 2.1.5
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/713 [WIP] update scalatest to version 2.1.5 You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark scalatest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/713.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #713 commit c4589286f534c6e720954c0433903643c73d201e Author: witgo wi...@qq.com Date: 2014-05-09T03:16:50Z update scalatest to version 2.1.5 commit 2c543b93fb3eb67b0e88e8fdeb5380731e68651c Author: witgo wi...@qq.com Date: 2014-05-09T05:27:23Z fix ReplSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP]SPARK-1712: TaskDescription instance is t...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/677#issuecomment-42442064 @pwendell How about [this solution](https://github.com/witgo/spark/compare/SPARK-1712_new)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: fix building spark with maven documentation
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/712 fix building spark with maven documentation You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark building-with-maven Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/712.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #712 commit 215523bdcd50379538a204d256a4dbdaab5a8db7 Author: witgo wi...@qq.com Date: 2014-05-09T08:34:40Z fix building spark with maven documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: 【SPARK-1779】add warning when memoryFractio...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-42729492 ãSPARK-1779ã = [SPARK-1779] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1644] The org.datanucleus:* should not ...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/688#issuecomment-42730561 @pwendell Has been updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/646#discussion_r12507330 --- Diff: conf/spark-env.sh.template --- @@ -38,6 +38,7 @@ # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. -Dx=y) +# - SPARK_DRIVER_MEMORY, Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb) --- End diff -- `./bin/spark-shell --driver-memory 2g` = ``` /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -cp ::/Users/witgo/work/code/java/spark/dist/conf:/Users/witgo/work/code/java/spark/dist/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop0.23.9.jar -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-internal --driver-memory 2g --class org.apache.spark.repl.Main ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: remove outdated runtime Information scala home
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/728 remove outdated runtime Information scala home You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark scala_home Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/728.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #728 commit fac094ad2b68415285d85c67754deda4e2bee116 Author: witgo wi...@qq.com Date: 2014-05-11T04:27:31Z remove outdated runtime Information scala home --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/646#discussion_r12507598 --- Diff: conf/spark-env.sh.template --- @@ -38,6 +38,7 @@ # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. -Dx=y) +# - SPARK_DRIVER_MEMORY, Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb) --- End diff -- If so, ``` if [ ! -z $DRIVER_MEMORY ] [ ! -z $DEPLOY_MODE ] [ $DEPLOY_MODE = client ]; then export SPARK_MEM=$DRIVER_MEMORY fi ``` is not correct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/646 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/646#discussion_r12507619 --- Diff: conf/spark-env.sh.template --- @@ -38,6 +38,7 @@ # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node # - SPARK_WORKER_DIR, to set the working directory of worker processes # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. -Dx=y) +# - SPARK_DRIVER_MEMORY, Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb) --- End diff -- Yes,it work for me. `./bin/spark-shell --driver-memory 2g` = ``` /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -cp ::/Users/witgo/work/code/java/spark/dist/conf:/Users/witgo/work/code/java/spark/dist/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop0.23.9.jar -Djava.library.path= -Xms2g -Xmx2g org.apache.spark.deploy.SparkSubmit spark-internal --driver-memory 2g --class org.apache.spark.repl.Main ``` But in ` --driver-memory 2g --class org.apache.spark.repl.Main ` , `--driver-memory 2g` is unnecessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---