[GitHub] spark pull request: [Minor] rat exclude dependency-reduced-pom.xml

2014-09-09 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2326#issuecomment-55064478
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55094784
  
@allwefantasy   
我的测试语料库大小是`196558` 个文档, `7897767` 个词. 
迭代次数是`100`次.
你的24万文档总共有多个词?
你可以贴出 stage 的运行截图,url 一般是 stages/stage?id=226 
之类的.
如下的截图:

![qq20140910-1](https://cloud.githubusercontent.com/assets/302879/4215649/df5ead34-38d0-11e4-868a-54553dc4f910.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55095890
  
@allwefantasy 
我认为这里的代码`  Document(parts(0).toInt,(0 until 
wordInfo.value.size).map(k= values.getOrElse(k,0)).toArray)` 
是有点问题的.. 
应该是这样处理的. 每个词对应一个整数,  例如 a = 1, 
b=2, c= 3, d= 4 ...  z = 27
如果你的文档切词的序列是 a, b,d,a 这三个词  
Document实例是这么创建的.`Document(1,Array(1,2,4,1))`.
Document conten 本质上是文档词的分词产生的结果.   只是把 
a,b,d,a 这样的字符串数组用整数数组表示.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55096559
  
@srowen I will try to translate the comments into English


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3470 [CORE] [STREAMING] Add Closeable / ...

2014-09-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2346#issuecomment-55132817
  
The relevant PR: #991


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build

2014-09-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1330#issuecomment-55210927
  
The code has been updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build

2014-09-10 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/1330#discussion_r17401380
  
--- Diff: pom.xml ---
@@ -839,7 +839,6 @@
   arg-unchecked/arg
   arg-deprecation/arg
   arg-feature/arg
-  arg-language:postfixOps/arg
--- End diff --

@jkbradley   I removed this parameter. The related discussion in #1069


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-11 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55223673
  
@allwefantasy  Spark是可以调整executor同时运行的task数量的. 
   如果你想让每个executor同时可以运行17个task. 
可以在`conf/spark-defaults.conf` 文件添加如下配置

 spark.executor.cores 17


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build

2014-09-11 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1330#issuecomment-55236678
  
No postfix warnings in 179ba61 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-11 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-55280269
  
@allwefantasy  现有的代ç 
åœ¨è¿­ä»£è®¡ç®—过程中创建了太多的TopicModel实例, 
我现在正在尝试解决这个问题.
感谢你的反馈.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2491] Don't handle uncaught exceptions ...

2014-11-01 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1482#issuecomment-61363413
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark shell class path is not correctly set if...

2014-11-01 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3050

Spark shell class path is not correctly set if 
spark.driver.extraClassPath is set in defaults.conf



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-4161

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3050


commit 38890abc5b87222e25998788a35f29d994f08050
Author: GuoQiang Li wi...@qq.com
Date:   2014-11-01T17:30:11Z

Spark shell class path is not correctly set if 
spark.driver.extraClassPath is set in defaults.conf




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4161]Spark shell class path is not corr...

2014-11-01 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3051

[SPARK-4161]Spark shell class path is not correctly set if 
spark.driver.extraClassP...

...ath is set in defaults.conf.(branch-1.1 backport)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-4161_1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3051


commit 44bad33cf8bd3da2445c606985090041b3154b7b
Author: GuoQiang Li wi...@qq.com
Date:   2014-11-01T17:32:56Z

Spark shell class path is not correctly set if 
spark.driver.extraClassPath is set in defaults.conf




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Minor bug fixes in bin/run-example

2014-11-02 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3069

[Minor] Minor bug fixes in bin/run-example

`./sbt/sbt clean  assembly` = 

`examples/target/scala-2.10/spark-examples_2-10-1.2.0-SNAPSHOT-hadoop1.0.4.jar`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark run-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3069.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3069


commit e13fab79974eca197daee72b21bea57dccb3d8fb
Author: GuoQiang Li wi...@qq.com
Date:   2014-11-03T06:47:25Z

Small bug fixes in bin/run-example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Minor bug fixes in bin/run-example

2014-11-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3069#issuecomment-61449081
  
`mvn package`  generated file is like this 
`spark-examples-1.2.0-SNAPSHOT-*` . 
`./sbt/sbt clean assembly`  generated file is like this 
`spark-examples_2-10-1.2.0-SNAPSHOT-* `( Here more such strings:`_2-10`).

`spark-examples-*hadoop*.jar`  only matching  
`spark-examples-1.2.0-SNAPSHOT-*` .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Minor bug fixes in bin/run-example

2014-11-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3069#issuecomment-61463811
  
The current solution is simple to implement, and other source also used 
this solution. eg:

[compute-classpath.cmd#L52](https://github.com/apache/spark/blob/master/bin/compute-classpath.cmd#L52)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3797] Run external shuffle service in Y...

2014-11-06 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/3082#discussion_r19935139
  
--- Diff: make-distribution.sh ---
@@ -181,6 +181,9 @@ echo Spark $VERSION$GITREVSTRING built for Hadoop 
$SPARK_HADOOP_VERSION  $DI
 # Copy jars
 cp $FWDIR/assembly/target/scala*/*assembly*hadoop*.jar $DISTDIR/lib/
 cp $FWDIR/examples/target/scala*/spark-examples*.jar $DISTDIR/lib/
+cp $FWDIR/network/yarn/target/scala*/spark-network-yarn*.jar 
$DISTDIR/lib/
--- End diff --

@andrewor14 
Here is a problem:
I use  this command line: `./make-distribution.sh 
-Dhadoop.version=2.3.0-cdh5.0.1 -Dyarn.version=2.3.0-cdh5.0.1 -Phadoop-2.3 
-Pyarn  -Pnetlib-lgpl` , but 
`$FWDIR/network/yarn/target/scala*/spark-network-yarn*.jar` does not exist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-06 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62085010
  
We should use matrix to calculate the forward propagation ,back propagation 
see 
http://deeplearning.stanford.edu/wiki/index.php/Neural_Network_Vectorization


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-06 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62089562
  
We can not use existing Gradient classes,Let the whole iterative process is 
completed in the form of matrix calculation.Moreover We can use the als 
algorithm design, cut the matrix into the appropriate pieces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-08 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62254979
  
I agree with what @debasish83 said. We should find a suitable solution to 
weight matrix distributed storage. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][MLLIB]Add Restricted Boltzma...

2014-11-12 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3222

[WIP][SPARK-4251][MLLIB]Add Restricted Boltzmann machine(RBM) algorithm to 
MLlib



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark rbm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3222


commit 8ced3e8784c75dbb0c874fe207db61aa1f8e6e7b
Author: GuoQiang Li wi...@qq.com
Date:   2014-11-12T09:15:09Z

Add Restricted Boltzmann machine(RBM) algorithm to MLlib




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Support cross building for Scala 2.11

2014-11-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3159#issuecomment-62700413
  
@pwendell  @ScrapCodes 
This patch has a bug: 
`./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.1 
-Dyarn.version=2.3.0-cdh5.0.1 -Phadoop-2.3 -Pyarn` 
`./bin/spark-shell`=
```
java.lang.ClassNotFoundException: org.apache.spark.repl.Main
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at 
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:337)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class

2014-11-12 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3228

[HOTFIX]: Fix maven build missing some class

 The bug was caused by  #3159 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark hotfix_repl

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3228


commit 7ea18a70cc645f0b39a97119d3860c477f4e987c
Author: GuoQiang Li wi...@qq.com
Date:   2014-11-12T12:58:45Z

HOTFIX: Fix maven build missing some class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class

2014-11-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3228#issuecomment-62714530
  
cc @pwendell  @ScrapCodes  @srowen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class

2014-11-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3228#issuecomment-62849650
  
How about the following?
```xml
 profile
  idscala-2.10/id
  activation
property
  namescala.version/name
  value2.10.4/value
/property
  /activation
  properties
scala.binary.version2.10/scala.binary.version
jline.version${scala.version}/jline.version
jline.groupidorg.scala-lang/jline.groupid
  /properties
  modules
moduleexternal/kafka/module
  /modules
/profile
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class

2014-11-12 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3228#issuecomment-62851150
  
Yes, it seems to work.It seems that the user must explicitly  set 
`scala.version`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX]: Fix maven build missing some class

2014-11-13 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/3228


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-14 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-63030582
  
Jenkins, retest this please. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4422][MLLIB]In some cases, Vectors.from...

2014-11-14 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3281

[SPARK-4422][MLLIB]In some cases, Vectors.fromBreeze get wrong results.

cc @mengxr

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-4422

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3281


commit 7a10123aa35c8558f4913eb5d2b56a84d46f6e82
Author: GuoQiang Li wi...@qq.com
Date:   2014-11-15T06:27:42Z

In some cases, Vectors.fromBreeze get wrong results.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-15 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-63172769
  
Sorry, This patch is still work in process., I will add the annotation and 
document at later.
BTW, My English is poor. we can communicate in email,This is more efficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Bumping version to 1.3.0-SNAPSHOT.

2014-11-15 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3277#issuecomment-63203941
  

[package.scala#L47](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/package.scala#L47)
 should be modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-16 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-63222980
  
Now, neural net model is stored in a matrix. The model is able to support 
1000 * 500 * 100 three-layer neural network and 10*1000  two-layer neural 
network. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-16 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/3222#discussion_r20410641
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/neuralNetwork/DBN.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.neuralNetwork
+
+import org.apache.spark.Logging
+import org.apache.spark.mllib.linalg.{Vector = SV}
+import org.apache.spark.rdd.RDD
+
+class DBN(val stackedRBM: StackedRBM, val nn: NN)
+  extends Logging with Serializable {
+}
+
+object DBN extends Logging {
+  def train(
+data: RDD[(SV, SV)],
+batchSize: Int,
+numIteration: Int,
+topology: Array[Int],
+fraction: Double,
+momentum: Double,
+weightCost: Double,
+learningRate: Double): DBN = {
+val dbn = initializeDBN(topology)
+pretrain(data, batchSize, numIteration, dbn,
+  fraction, momentum, weightCost, learningRate)
+NN.train(data, batchSize, numIteration, dbn.nn,
+  fraction, momentum, weightCost, learningRate)
+dbn
+  }
+
+  private[mllib] def pretrain(
+data: RDD[(SV, SV)],
+batchSize: Int,
+numIteration: Int,
+dbn: DBN,
+fraction: Double,
+momentum: Double,
+weightCost: Double,
+learningRate: Double): DBN = {
+StackedRBM.train(data.map(_._1), batchSize, numIteration, 
dbn.stackedRBM,
+  fraction, momentum, weightCost, learningRate, 
dbn.stackedRBM.numLayer - 1)
--- End diff --

 The last layer should be also trained.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-19 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/3222#discussion_r20575084
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/neuralNetwork/DBN.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.neuralNetwork
+
+import org.apache.spark.Logging
+import org.apache.spark.mllib.linalg.{Vector = SV}
+import org.apache.spark.rdd.RDD
+
+class DBN(val stackedRBM: StackedRBM, val nn: NN)
+  extends Logging with Serializable {
+}
+
+object DBN extends Logging {
+  def train(
+data: RDD[(SV, SV)],
+batchSize: Int,
+numIteration: Int,
+topology: Array[Int],
+fraction: Double,
+momentum: Double,
+weightCost: Double,
+learningRate: Double): DBN = {
+val dbn = initializeDBN(topology)
+pretrain(data, batchSize, numIteration, dbn,
+  fraction, momentum, weightCost, learningRate)
+NN.train(data, batchSize, numIteration, dbn.nn,
+  fraction, momentum, weightCost, learningRate)
+dbn
+  }
+
+  private[mllib] def pretrain(
+data: RDD[(SV, SV)],
+batchSize: Int,
+numIteration: Int,
+dbn: DBN,
+fraction: Double,
+momentum: Double,
+weightCost: Double,
+learningRate: Double): DBN = {
+StackedRBM.train(data.map(_._1), batchSize, numIteration, 
dbn.stackedRBM,
+  fraction, momentum, weightCost, learningRate, 
dbn.stackedRBM.numLayer - 1)
--- End diff --

I see, Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-19 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r20589805
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L

[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...

2014-11-20 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3399

[SPARK-4526][MLLIB]GradientDescent get a wrong gradient value according to 
the gradient formula.

This is caused by the miniBatchSize parameter.The number of `RDD.sample` 
returns is not fixed.
cc @mengxr

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark GradientDescent

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3399.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3399


commit 606b27a1a6c1e5a1e4c51d01d1f6da9f6ed31524
Author: GuoQiang Li wi...@qq.com
Date:   2014-11-21T06:34:50Z

GradientDescent get a wrong gradient value according to the gradient 
formula, which is caused by the miniBatchSize parameter.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...

2014-11-20 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3399#issuecomment-63934659
  
AmplabJenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...

2014-11-21 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3399#issuecomment-63941966
  
@mengxr  I'm not sure.  In my test of #3222, The convergence rate of SGD 
less than expected.   it should be affected by this issue.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]GradientDescent get a wrong...

2014-11-21 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/3399#discussion_r20754059
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala 
---
@@ -185,25 +184,29 @@ object GradientDescent extends Logging {
   val bcWeights = data.context.broadcast(weights)
   // Sample a subset (fraction miniBatchFraction) of the total data
   // compute and sum up the subgradients on this subset (this is one 
map-reduce)
-  val (gradientSum, lossSum) = data.sample(false, miniBatchFraction, 
42 + i)
-.treeAggregate((BDV.zeros[Double](n), 0.0))(
-  seqOp = (c, v) = (c, v) match { case ((grad, loss), (label, 
features)) =
-val l = gradient.compute(features, label, bcWeights.value, 
Vectors.fromBreeze(grad))
-(grad, loss + l)
+  val (gradientSum, lossSum, miniBatchSize) = data.sample(false, 
miniBatchFraction, 42 + i)
+.treeAggregate((BDV.zeros[Double](n), 0.0, 0.0))(
--- End diff --

Yes, It should use a long variable .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-11-21 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/3222#discussion_r20754174
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/neuralNetwork/StackedRBM.scala ---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.neuralNetwork
+
+import java.util.Random
+
+import scala.collection.JavaConversions._
+
+import breeze.linalg.{DenseVector = BDV, DenseMatrix = BDM, sum = 
brzSum}
+import breeze.numerics.{sigmoid = brzSigmoid}
+
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.Logging
+import org.apache.spark.mllib.linalg.{Vector = SV, DenseVector = SDV}
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.util.random.XORShiftRandom
+import org.apache.spark.rdd.RDD
+
+class StackedRBM(val innerRBMs: Array[RBM])
+  extends Logging with Serializable {
+  def this(topology: Array[Int]) {
+this(StackedRBM.initializeRBMs(topology))
+  }
+
+  def numLayer = innerRBMs.length
+
+  def numInput = innerRBMs.head.numVisible
+
+  def numOut = innerRBMs.last.numHidden
+
+  def activateHidden(visible: BDM[Double], toLayer: Int): BDM[Double] = {
+var x = visible
+for (layer - 0 until toLayer) {
+  x = innerRBMs(layer).activateHidden(x)
+  // x = innerRBMs(layer).bernoulli(x)
--- End diff --

There needs to be converted to binary vector?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4530][MLLIB]GradientDescent get a wrong...

2014-11-25 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3399#issuecomment-64381732
  
@mengxr  The title has been updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Improved build configuration

2014-04-23 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/480#discussion_r11933015
  
--- Diff: pom.xml ---
@@ -506,7 +508,45 @@
   dependency
 groupIdorg.apache.avro/groupId
 artifactIdavro/artifactId
-version1.7.4/version
+version${avro.version}/version
+exclusions
+  exclusion
+groupIdorg.jboss.netty/groupId
+artifactIdnetty/artifactId
+  /exclusion
+  exclusion
+groupIdio.netty/groupId
+artifactIdnetty/artifactId
+  /exclusion
+/exclusions
+  /dependency
+  dependency
+groupIdorg.apache.avro/groupId
+artifactIdavro-ipc/artifactId
--- End diff --

spark-hive dependency:
```
[INFO] +- org.apache.hive:hive-serde:jar:0.12.0:compile
[INFO] |  +- org.apache.hive:hive-common:jar:0.12.0:compile
[INFO] |  |  +- org.apache.hive:hive-shims:jar:0.12.0:compile
[INFO] |  |  |  \- commons-logging:commons-logging-api:jar:1.0.4:compile
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] |  | \- org.tukaani:xz:jar:1.0:compile
[INFO] |  +- org.mockito:mockito-all:jar:1.8.5:test (version managed from 
1.8.2; scope managed from compile)
[INFO] |  +- org.apache.thrift:libfb303:jar:0.9.0:compile
[INFO] |  |  \- org.apache.thrift:libthrift:jar:0.9.0:compile
[INFO] |  | +- org.apache.httpcomponents:httpclient:jar:4.1.3:compile
[INFO] |  | \- org.apache.httpcomponents:httpcore:jar:4.1.3:compile
[INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
[INFO] |  +- org.apache.avro:avro:jar:1.7.4:compile (version managed from 
1.7.1)
[INFO] |  |  \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.1:compile
[INFO] | \- org.apache.avro:avro-ipc:jar:1.7.1:compile
[INFO] |+- org.mortbay.jetty:jetty:jar:6.1.26:compile
[INFO] |+- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |+- org.apache.velocity:velocity:jar:1.7:compile
[INFO] |\- org.mortbay.jetty:servlet-api:jar:2.5-20081211:compile
```
spark-streaming-flume dependency:
```
[INFO] +- org.apache.flume:flume-ng-sdk:jar:1.2.0:compile
[INFO] |  +- org.apache.avro:avro:jar:1.7.4:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
[INFO] |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] |  | \- org.tukaani:xz:jar:1.0:compile
[INFO] |  +- org.apache.avro:avro-ipc:jar:1.6.3:compile
[INFO] |  |  +- org.mortbay.jetty:jetty:jar:6.1.26:compile
[INFO] |  |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |  |  \- org.apache.velocity:velocity:jar:1.7:compile
[INFO] |  | +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  | \- commons-lang:commons-lang:jar:2.4:compile
```
 inconsistent versions dependency  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improved build configuration

2014-04-23 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/480#discussion_r11933105
  
--- Diff: pom.xml ---
@@ -793,6 +833,17 @@
   /build
 
   profiles
+!-- SPARK-1121: Adds an explicit dependency on Avro to work around a 
Hadoop 0.23.X issue --
+profile
+  idhadoop-0.23/id
--- End diff --

I have not found this problem in the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1119 and other build improvements

2014-04-23 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/502#issuecomment-41237973
  
@berngp @pwendell ,
Whether we can delete the `yarn.version`, only using `hadoop.version`. This 
will cause any problems?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1119 and other build improvements

2014-04-23 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/502#issuecomment-41239453
  
@berngp 
Most of the people uses the same version of HDFS vs YARN.
We can be so
```xml
   hadoop.version1.0.4/hadoop.version
   yarn.version${hadoop.version}/yarn.version
```
```xml
 profile
  idyarn-alpha/id
  properties
hadoop.major.version2/hadoop.major.version   
hadoop.version0.23.7/hadoop.version
  /properties
  modules
moduleyarn/module
  /modules
  dependencyManagement
dependencies
  dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-yarn-api/artifactId
version${yarn.version}/version
  /dependency
  dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-yarn-common/artifactId
version${yarn.version}/version
  /dependency
  dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-yarn-client/artifactId
version${yarn.version}/version
  /dependency
/dependencies
  /dependencyManagement
/profile
```
Most of the people use `mvn -Pyarn -Dhadoop.version=2.3.0 -DskipTests clean 
package`.
Others use`mvn -Pyarn -Dhadoop.version=2.3.0 -DskipTests 
-Dyarn.version=0.23.9 clean package`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix SPARK-1609: Executor fails to start when C...

2014-04-24 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/547

Fix SPARK-1609:  Executor fails to start when Command.extraJavaOptions 
contains multiple Java options



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1609

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #547


commit 8a265b7f1084e8d211833dc31633f1f2a16195c6
Author: witgo wi...@qq.com
Date:   2014-04-24T17:12:10Z

Fix SPARK-1609: Executor fails to start when use spark-submit

commit 86fc4bbae56f937e88595a10a01b3db7770e460b
Author: witgo wi...@qq.com
Date:   2014-04-25T02:51:54Z

bugfix

commit f7c0ab71ceef4023ec2f63f65d0a7d346e989fa0
Author: witgo wi...@qq.com
Date:   2014-04-25T02:55:01Z

bugfix

commit 1185605f34457767909259f83a8e44be7456d7fe
Author: witgo wi...@qq.com
Date:   2014-04-25T03:00:29Z

fix extraJavaOptions split

commit bcf36cb8946fa67c77af8e6ac813808bbb538be0
Author: witgo wi...@qq.com
Date:   2014-04-25T03:04:49Z

Merge branch 'master' of https://github.com/apache/spark into SPARK-1609




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify spark.ui.killEnabled default is false

2014-04-25 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/510


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix SPARK-1609: Executor fails to start when C...

2014-04-25 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/547#discussion_r12023196
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala ---
@@ -48,7 +48,13 @@ object CommandUtils extends Logging {
   def buildJavaOpts(command: Command, memory: Int, sparkHome: String): 
Seq[String] = {
 val memoryOpts = Seq(s-Xms${memory}M, s-Xmx${memory}M)
 // Note, this will coalesce multiple options into a single command 
component
-val extraOpts = command.extraJavaOptions.toSeq
+val extraOpts = command.extraJavaOptions match {
--- End diff --

Yes, `val extraOpts = 
command.extraJavaOptions.map(Utils.splitCommandString).getOrElse(Seq())`  is 
better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...

2014-04-26 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/569

Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_...

...OS_WINDOWS`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1629

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/569.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #569


commit 49e248e7a055c50586bf1f4170c5404566adba23
Author: witgo wi...@qq.com
Date:   2014-04-27T02:44:29Z

Fix SPARK-1629: Spark should inline use of commons-lang 
`SystemUtils.IS_OS_WINDOWS`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix SPARK-1629: Spark should inline use of com...

2014-04-26 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/569#discussion_r12027354
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1056,4 +1055,11 @@ private[spark] object Utils extends Logging {
   def getHadoopFileSystem(path: String): FileSystem = {
 getHadoopFileSystem(new URI(path))
   }
+
+  /**
+   * return true if this is Windows.
+   */
+  def isWindows = Option(System.getProperty(os.name)).
--- End diff --

@srowen
```scala
def isWindows(): Boolean = {
  try {
val osName = System.getProperty(os.name)
osName != null  osName.startsWith(Windows)
  } catch {
case e: SecurityException = (log a warning and return false)
  }
}

You think here will be thrown a SecurityException .Why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: improvements spark-submit usage

2014-04-28 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/581

improvements spark-submit usage



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1659

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #581


commit 0b2cf9856ae68f37c3d2228f7a0d57c3414d760e
Author: witgo wi...@qq.com
Date:   2014-04-28T16:55:44Z

Delete spark-submit obsolete usage: --arg ARG




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...

2014-04-29 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/423#discussion_r12080480
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
---
@@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] 
extends Serializable {
   rdd.zipPartitions(other.rdd)(fn)(other.classTag, 
fakeClassTag[V]))(fakeClassTag[V])
   }
 
+  /**
+   * Zips this RDD with generated unique Long ids. Items in the kth 
partition will get ids k, n+k,
+   * 2*n+k, ..., where n is the number of partitions. So there may exist 
gaps, but this method
+   * won't trigger a spark job, which is different from 
[[org.apache.spark.rdd.RDD#zipWithIndex]].
+   */
+  def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = {
--- End diff --

When remove the `[Long]`. The type of return value is 
JavaPairRDDInteger,Object


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...

2014-04-29 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/423#discussion_r12081268
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
---
@@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] 
extends Serializable {
   rdd.zipPartitions(other.rdd)(fn)(other.classTag, 
fakeClassTag[V]))(fakeClassTag[V])
   }
 
+  /**
+   * Zips this RDD with generated unique Long ids. Items in the kth 
partition will get ids k, n+k,
+   * 2*n+k, ..., where n is the number of partitions. So there may exist 
gaps, but this method
+   * won't trigger a spark job, which is different from 
[[org.apache.spark.rdd.RDD#zipWithIndex]].
+   */
+  def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = {
--- End diff --

Yes,in my test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improved build configuration

2014-04-29 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/480#issuecomment-41646101
  
Cool!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...

2014-04-29 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/423#discussion_r12081885
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
---
@@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] 
extends Serializable {
   rdd.zipPartitions(other.rdd)(fn)(other.classTag, 
fakeClassTag[V]))(fakeClassTag[V])
   }
 
+  /**
+   * Zips this RDD with generated unique Long ids. Items in the kth 
partition will get ids k, n+k,
+   * 2*n+k, ..., where n is the number of partitions. So there may exist 
gaps, but this method
+   * won't trigger a spark job, which is different from 
[[org.apache.spark.rdd.RDD#zipWithIndex]].
+   */
+  def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = {
--- End diff --

```scala
  def zipWithUniqueId(): JavaPairRDD[T, JLong] = {
JavaPairRDD.fromRDD(rdd.zipWithUniqueId()).asInstanceOf[JavaPairRDD[T, 
JLong]]
  } 
```
 is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1509: add zipWithIndex zipWithUniqueId m...

2014-04-29 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/423#discussion_r12082137
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
---
@@ -263,6 +263,26 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] 
extends Serializable {
   rdd.zipPartitions(other.rdd)(fn)(other.classTag, 
fakeClassTag[V]))(fakeClassTag[V])
   }
 
+  /**
+   * Zips this RDD with generated unique Long ids. Items in the kth 
partition will get ids k, n+k,
+   * 2*n+k, ..., where n is the number of partitions. So there may exist 
gaps, but this method
+   * won't trigger a spark job, which is different from 
[[org.apache.spark.rdd.RDD#zipWithIndex]].
+   */
+  def zipWithUniqueId[Long](): JavaPairRDD[T, Long] = {
--- End diff --

@rxin  You're right, has been modified.

@mengxr 
```scala
 def zipWithUniqueId(): JavaPairRDD[T, java.lang.Long] = {
JavaPairRDD.fromRDD(rdd.zipWithUniqueId().map(x = (x._1, new 
java.lang.Long(x._2)))
```create too many objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improved build configuration � �

2014-04-29 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/590

Improved build configuration Ⅱ

@berngp 
I merge your code to this PR

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark improved_build

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/590.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #590


commit 4e96c0153063b35fc03e497f28292a97832e81d4
Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com
Date:   2014-04-15T21:03:30Z

Add YARN/Stable compiled classes to the CLASSPATH.

The change adds the `./yarn/stable/target/scala-version/classes` to
the _Classpath_ when a _dependencies_ assembly is available at the
assembly directory.

Why is this change necessary?
Ease the development features and bug-fixes for Spark-YARN.

[ticket: X] : NA

Author  : bernardo.gomezpala...@gmail.com
Reviewer: ?
Testing : ?

commit 1342886a396be00eda9449c6d84155dfecf954c8
Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com
Date:   2014-04-15T21:46:44Z

The `spark-class` shell now ignores non jar files in the assembly directory.

Why is this change necessary?

While developing in Spark I found myself rebuilding either the
dependencies assembly or the full spark assembly. I kept running into
the case of having both the dep-assembly and full-assembly in the same
directory and getting an error when I called either `spark-shell` or
`spark-submit`.

Quick fix: move either of them as a .bkp file depending on
the development work flow you are executing at the moment and enabling
the `spark-class` to ignore non-jar files. An other option could be to
move the offending jar to a different directory but in my opinion
keeping them in there is a bit tidier.

e.g.

```
ll ./assembly/target/scala-2.10
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp
```

[ticket: X] : ?

commit ddf2547aa2aea8155f8d6c0386e2cb37bcf61537
Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com
Date:   2014-04-15T21:53:23Z

The `spark-shell` option `--log-conf` also enables the 
SPARK_PRINT_LAUNCH_COMMAND .

Why is this change necessary?
Most likely when enabling the `--log-conf` through the `spark-shell` you
are also interested on the full invocation of the java command including the
_classpath_ and extended options. e.g.

```
INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark
INFO: Spark Master is yarn-client
INFO: Spark REPL options   -Dspark.logConf=true
Spark Command: 
/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp 
:/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berng
 
p/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop
 -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log 
-XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof 
-XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails 
-XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution 
-XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation 
-Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC 
-Dspark.cleaner.ttl=1 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true 
-Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main
```

[ticket: X] : ?

commit 22045394955992c2c8dfe0e1040c6bb972be6ce4
Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com
Date:   2014-04-15T22:15:23Z

Root is now Spark and qualify the assembly if it was built with YARN.

Why is this change necessary?
Renamed the SBT root project to spark to enhance readability.

Currently the assembly is qualified with the Hadoop Version

[GitHub] spark pull request: [WIP] Improved build configuration III

2014-04-30 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/598

[WIP] Improved build configuration III



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark sql-pom

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/598.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #598


commit 3d175194f27c86a605a9b65bbef2e51a551178e7
Author: witgo wi...@qq.com
Date:   2014-04-30T08:32:23Z

Improved build configuration III




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Improved build configuration III

2014-04-30 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/598#issuecomment-41810943
  
@pwendell
Now, I have a very radical idea, removing the support sbt. What problems 
will it have?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improved build configuration � �

2014-05-01 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/590#discussion_r12181447
  
--- Diff: project/SparkBuild.scala ---
@@ -55,7 +55,7 @@ object SparkBuild extends Build {
   val SCALAC_JVM_VERSION = jvm-1.6
   val JAVAC_JVM_VERSION = 1.6
 
-  lazy val root = Project(root, file(.), settings = rootSettings) 
aggregate(allProjects: _*)
+  lazy val root = Project(spark, file(.), settings = rootSettings) 
aggregate(allProjects: _*)
--- End diff --

Just to increase readability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improved build configuration � �

2014-05-01 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-41890856
  
@pwendell 
Have removed travis changes 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1681] Include datanucleus jars in Spark...

2014-05-01 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/610#issuecomment-41911017
  
There is another solution   #598  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1695: java8-tests compiler error: packag...

2014-05-01 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/611

SPARK-1695: java8-tests compiler error: package com.google.common.co...

...llections does not exist

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1695

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/611.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #611


commit d77a8875f30d460bdd5e301e30beb88d11fa5138
Author: witgo wi...@qq.com
Date:   2014-05-01T16:03:08Z

Fix SPARK-1695: java8-tests compiler error: package 
com.google.common.collections does not exist




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improved build configuration � �

2014-05-02 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42094359
  
@tgravescs 
I tested many times, These are all can pass.
`mvn clean package -DskipTests -Pyarn-alpha -Dhadoop.version=0.23.7 
-Phadoop-0.23`
`mvn clean package -DskipTests -Pyarn-alpha -Dhadoop.version=0.23.9 
-Phadoop-0.23`
`mvn clean package -DskipTests -Pyarn-alpha -Dhadoop.version=0.23.7 
-Phadoop-0.23 -Dyarn.version=0.23.10`
The code is not the latest?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1699: Python relative independence from ...

2014-05-03 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/624

SPARK-1699: Python relative independence from the core, becomes subprojects



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark python-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #624


commit d9a31db82b30ebfa6c27227507e2e20bb1e8d08a
Author: witgo wi...@qq.com
Date:   2014-05-03T06:20:52Z

SPARK-1699: Python relative should be independence from the core, becomes 
subprojects




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improved build configuration � �

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42098647
  
@pwendell How about [this 
solution](https://github.com/witgo/spark/commit/0ed124dc0e453a0a59d3c387651be970859a9a0a)?
 Only exclusion the servlet-api 2.5 dependency


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: add yarn.version for profile yarn and yarn-alp...

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/625#issuecomment-42102176
  
 [The PR 590](https://github.com/apache/spark/pull/590) contains relevant  
changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42102979
  
Hi @pwendell ,@srowen 
All the change is very small,and [this 
solution](https://github.com/witgo/spark/commit/0ed124dc0e453a0a59d3c387651be970859a9a0a)
  only work with hadoop2.3.x, 2.4.x  can be merged into 1.0 

Your views?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: The default version of yarn is equal to the ha...

2014-05-03 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/626

The default version of yarn is equal to the hadoop version

This is a part of [PR 590](https://github.com/apache/spark/pull/590)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark yarn_version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #626


commit c76763b875beedba0a144efe1d3b814cfc8b811b
Author: witgo wi...@qq.com
Date:   2014-05-03T13:57:09Z

The default value of yarn.version is equal to hadoop.version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-1699: Python relative independence...

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/624#issuecomment-42107623
  
 Branch is wrong, temporarily closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-1699: Python relative independence...

2014-05-03 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/624


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42109604
  
@srowen Not every one uses the same version of HDFS vs YARN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42110042
  
@srowen  Related discussion in [PR 
502](https://github.com/apache/spark/pull/502).
@berngp Can you explain the reason of not using  the same version of HDFS 
vs YARN ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Improve build configuration � �

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/590#issuecomment-42120300
  
@pwendell 
I did not notice here, has been modified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: The default version of yarn is equal to the ha...

2014-05-03 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/626#discussion_r12259307
  
--- Diff: pom.xml ---
@@ -558,65 +560,8 @@
 artifactIdjets3t/artifactId
 version0.7.1/version
   /dependency
-  dependency
--- End diff --

You're right. but in `mvn  -Pyarn clean package`,   the hadoop version 2.2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: The default version of yarn is equal to the ha...

2014-05-03 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/626#discussion_r12259340
  
--- Diff: pom.xml ---
@@ -558,65 +560,8 @@
 artifactIdjets3t/artifactId
 version0.7.1/version
   /dependency
-  dependency
--- End diff --

|maven| hadoop.version  | yarn.version |
| : |:---:|:-:|
|`mvn -Pyarn -DskipTests clean package`|2.2.0|2.2.0|
|`mvn -Phadoop-0.23 -Pyarn-alpha -DskipTests clean package`|0.23.7|0.23.7
|`mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean 
package`| 2.0.0-cdh4.2.0|2.0.0-cdh4.2.0
|`mvn -Phadoop-0.23 -Pyarn-alpha -Dhadoop.version=2.3.0 
-Dyarn.version=0.23.7 -DskipTests clean package`|2.3.0|0.23.7|
|`mvn -DskipTests clean package`|1.0.4|not support|
|`mvn -Pyarn-alpha -Dyarn.version=0.23.7 -Dhadoop.version=1.0.4  
-Phadoop-0.23 -DskipTests package`|1.0.4|0.23.7|



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: The default version of yarn is equal to the ha...

2014-05-03 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/626#discussion_r12259611
  
--- Diff: pom.xml ---
@@ -558,65 +560,8 @@
 artifactIdjets3t/artifactId
 version0.7.1/version
   /dependency
-  dependency
--- End diff --

In `mvn -DskipTests clean package`   the dependency declarations of 
`hadoop-yarn-api`,`hadoop-yarn-common`,`hadoop-yarn-client` is no necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: The default version of yarn is equal to the ha...

2014-05-03 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/626#discussion_r12259715
  
--- Diff: pom.xml ---
@@ -558,65 +560,8 @@
 artifactIdjets3t/artifactId
 version0.7.1/version
   /dependency
-  dependency
--- End diff --

When `hadop.version`  is 1.0.4 , `yarn.version`  is also 1.0.4

  dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-yarn-client/artifactId
version${yarn.version}/version
  /dependency 

is not correct



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1693: Most of the tests throw a java.lan...

2014-05-04 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/628

SPARK-1693: Most of the tests throw a java.lang.SecurityException when s...

...park built for hadoop 2.3.0 , 2.4.0

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1693_new

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #628


commit dc63905908cb7c84c741bb5fdc4ad7d4abdcb0b2
Author: witgo wi...@qq.com
Date:   2014-05-04T06:43:43Z

SPARK-1693: Most of the tests throw a java.lang.SecurityException when 
spark built for hadoop 2.3.0 , 2.4.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556. jets3t dep doesn't update properly...

2014-05-04 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/629#discussion_r12261160
  
--- Diff: core/pom.xml ---
@@ -38,12 +38,6 @@
 dependency
   groupIdnet.java.dev.jets3t/groupId
   artifactIdjets3t/artifactId
-  exclusions
-exclusion
-  groupIdcommons-logging/groupId
-  artifactIdcommons-logging/artifactId
-/exclusion
-  /exclusions
--- End diff --

Why remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556. jets3t dep doesn't update properly...

2014-05-04 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/629#issuecomment-42132419
  
Looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1699: Python relative independent, becom...

2014-05-04 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/631

SPARK-1699: Python relative independent, becomes a subproject



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1699

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #631


commit 74ffefb453dd14f49d88c7a7b8a406b82f325c56
Author: witgo wi...@qq.com
Date:   2014-05-04T16:05:44Z

SPARK-1699: Python relative independence from the core, becomes subprojects




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add missing description to spark-env.sh.templa...

2014-05-05 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/646

Add missing description to spark-env.sh.template



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark spark_env

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #646


commit 9a95a564593ad4071486abb51750cbf6c9b921ff
Author: witgo wi...@qq.com
Date:   2014-05-05T10:25:04Z

Add missing description to spark-env.sh.template




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1734: spark-submit throws an exception: ...

2014-05-06 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/665

SPARK-1734: spark-submit throws an exception: Exception in thread main...

... java.lang.ClassNotFoundException: 
org.apache.spark.broadcast.TorrentBroadcastFactory

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1734

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #665


commit cacf23852027fb3d0fb5a020f1e9216bba0468d3
Author: witgo wi...@qq.com
Date:   2014-05-06T09:13:49Z

SPARK-1734: spark-submit throws an exception: Exception in thread main 
java.lang.ClassNotFoundException: 
org.apache.spark.broadcast.TorrentBroadcastFactory




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1699: Python relative independent, becom...

2014-05-07 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

2014-05-07 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/677#discussion_r12364638
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
   // we assume the task can be serialized without exceptions.
   val serializedTask = Task.serializeWithDependencies(
 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
+  if (serializedTask.limit = akkaFrameSize - 1024) {
--- End diff --

`serializedTask` = `4356 bytes`
`LaunchTask` = `4797 bytes`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

2014-05-07 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/677#discussion_r12363925
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
   // we assume the task can be serialized without exceptions.
   val serializedTask = Task.serializeWithDependencies(
 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
+  if (serializedTask.limit = akkaFrameSize - 1024) {
--- End diff --

The reference  
[Executor.scala#L235](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L235).may
 not fit here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: improve the readability of SparkContext.scala

2014-05-07 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/414


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

2014-05-07 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/677#discussion_r12364078
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -414,6 +415,14 @@ private[spark] class TaskSetManager(
   // we assume the task can be serialized without exceptions.
   val serializedTask = Task.serializeWithDependencies(
 task, sched.sc.addedFiles, sched.sc.addedJars, ser)
+  if (serializedTask.limit = akkaFrameSize - 1024) {
+val msg = Serialized task %s:%d were %d bytes which  +
--- End diff --

The reason for this is to keep the same style of the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

2014-05-07 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/677

SPARK-1712: TaskDescription instance is too big causes Spark to hang



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-1712

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #677


commit e6578400ce58104d2b022f62110ac83f82a92872
Author: witgo wi...@qq.com
Date:   2014-05-07T05:12:34Z

SPARK-1712: TaskDescription instance is too big causes Spark to hang




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] update scalatest to version 2.1.5

2014-05-10 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/713

[WIP] update scalatest to version 2.1.5



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark scalatest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #713


commit c4589286f534c6e720954c0433903643c73d201e
Author: witgo wi...@qq.com
Date:   2014-05-09T03:16:50Z

update scalatest to version 2.1.5

commit 2c543b93fb3eb67b0e88e8fdeb5380731e68651c
Author: witgo wi...@qq.com
Date:   2014-05-09T05:27:23Z

fix ReplSuite.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]SPARK-1712: TaskDescription instance is t...

2014-05-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/677#issuecomment-42442064
  
@pwendell
How about [this 
solution](https://github.com/witgo/spark/compare/SPARK-1712_new)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: fix building spark with maven documentation

2014-05-10 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/712

fix building spark with maven documentation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark building-with-maven

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #712


commit 215523bdcd50379538a204d256a4dbdaab5a8db7
Author: witgo wi...@qq.com
Date:   2014-05-09T08:34:40Z

fix building spark with maven documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: 【SPARK-1779】add warning when memoryFractio...

2014-05-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-42729492
  
【SPARK-1779】 = [SPARK-1779]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1644] The org.datanucleus:* should not ...

2014-05-10 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/688#issuecomment-42730561
  
@pwendell  
Has been updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...

2014-05-10 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/646#discussion_r12507330
  
--- Diff: conf/spark-env.sh.template ---
@@ -38,6 +38,7 @@
 # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
 # - SPARK_WORKER_DIR, to set the working directory of worker processes
 # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. 
-Dx=y)
+# - SPARK_DRIVER_MEMORY, Memory for driver (e.g. 1000M, 2G) (Default: 512 
Mb)
--- End diff --

`./bin/spark-shell --driver-memory 2g` = 
```
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java 
-cp 
::/Users/witgo/work/code/java/spark/dist/conf:/Users/witgo/work/code/java/spark/dist/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop0.23.9.jar
 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit 
spark-internal --driver-memory 2g --class org.apache.spark.repl.Main
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: remove outdated runtime Information scala home

2014-05-10 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/728

remove outdated runtime Information scala home



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark scala_home

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/728.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #728


commit fac094ad2b68415285d85c67754deda4e2bee116
Author: witgo wi...@qq.com
Date:   2014-05-11T04:27:31Z

remove outdated runtime Information scala home




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...

2014-05-11 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/646#discussion_r12507598
  
--- Diff: conf/spark-env.sh.template ---
@@ -38,6 +38,7 @@
 # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
 # - SPARK_WORKER_DIR, to set the working directory of worker processes
 # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. 
-Dx=y)
+# - SPARK_DRIVER_MEMORY, Memory for driver (e.g. 1000M, 2G) (Default: 512 
Mb)
--- End diff --

If so, 
```
if [ ! -z $DRIVER_MEMORY ]  [ ! -z $DEPLOY_MODE ]  [ $DEPLOY_MODE = 
client ]; then
  export SPARK_MEM=$DRIVER_MEMORY
fi
```  is not correct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...

2014-05-11 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/646


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1756: Add missing description to spark-e...

2014-05-11 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/646#discussion_r12507619
  
--- Diff: conf/spark-env.sh.template ---
@@ -38,6 +38,7 @@
 # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
 # - SPARK_WORKER_DIR, to set the working directory of worker processes
 # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. 
-Dx=y)
+# - SPARK_DRIVER_MEMORY, Memory for driver (e.g. 1000M, 2G) (Default: 512 
Mb)
--- End diff --

Yes,it work for me. 
`./bin/spark-shell --driver-memory 2g` =
```
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java 
-cp 
::/Users/witgo/work/code/java/spark/dist/conf:/Users/witgo/work/code/java/spark/dist/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop0.23.9.jar
 -Djava.library.path= -Xms2g -Xmx2g org.apache.spark.deploy.SparkSubmit 
spark-internal --driver-memory 2g --class org.apache.spark.repl.Main
```
But  in  ` --driver-memory 2g --class org.apache.spark.repl.Main ` , 
`--driver-memory 2g` is unnecessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


<    1   2   3   4   5   6   7   8   9   >