date:20150127

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread yanbohappy

Github user yanbohappy closed the pull request at:

https://github.com/apache/spark/pull/4207


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread yanbohappy

Github user yanbohappy commented on the pull request:

https://github.com/apache/spark/pull/4207#issuecomment-71794484
  
@OopsOutOfMemory Since you have go deep into this issue and I agree your PR 
is more mature. So close this one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-27 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3732#discussion_r23670603
  
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
@@ -252,7 +252,7 @@ trait Row extends Serializable {
*
* @throws ClassCastException when data type does not match.
*/
-  def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date]
+  def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i))
--- End diff --

Oh, sure...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3732#discussion_r23670522
  
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
@@ -252,7 +252,7 @@ trait Row extends Serializable {
*
* @throws ClassCastException when data type does not match.
*/
-  def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date]
+  def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i))
--- End diff --

this line should be reverted since you changed 
ScalaReflection.convertRowToScala right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71794073
  
  [Test build #26213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26213/consoleFull)
 for   PR 3732 at commit 
[`c37832b`](https://github.com/apache/spark/commit/c37832bc3a48493639b7a74d3277c11349942526).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread yanbohappy

Github user yanbohappy commented on the pull request:

https://github.com/apache/spark/pull/4207#issuecomment-71793953
  
@lianhuiwang In this PR https://github.com/apache/spark/pull/3948, 
CommandStrategy had been removed and command had been refactor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3715#issuecomment-71793808
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26209/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-27 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3732#discussion_r23670356
  
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
@@ -252,7 +252,7 @@ trait Row extends Serializable {
*
* @throws ClassCastException when data type does not match.
*/
-  def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date]
+  def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i))
--- End diff --

Thanks, code updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3715#issuecomment-71793802
  
  [Test build #26209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26209/consoleFull)
 for   PR 3715 at commit 
[`23b039a`](https://github.com/apache/spark/commit/23b039a896497c8f4cae1bf963274ff295841c37).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KafkaUtils(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5444][Network]Add a retry to deal with ...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4240#issuecomment-71793668
  
  [Test build #26212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26212/consoleFull)
 for   PR 4240 at commit 
[`cc926d2`](https://github.com/apache/spark/commit/cc926d2d4f737dd76a9fa593c0f93b183d2ca21f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add a retry to deal with the conflict port in ...

2015-01-27 Thread SaintBacchus

GitHub user SaintBacchus opened a pull request:

https://github.com/apache/spark/pull/4240

Add a retry to deal with the conflict port in netty server.

If the `spark.blockMnager.port` had conflicted with a specific port, Spark 
will throw an exception and exit.
So add a retry to avoid this situation. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SaintBacchus/spark NettyPortConflict

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4240.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4240


commit cc926d2d4f737dd76a9fa593c0f93b183d2ca21f
Author: huangzhaowei 
Date:   2015-01-28T06:21:27Z

Add a retry to deal with the conflict port in netty server.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4239#issuecomment-71792811
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-27 Thread catap

GitHub user catap opened a pull request:

https://github.com/apache/spark/pull/4239

Don't return `ERROR 500` when have missing args

Spark web UI return `HTTP ERROR 500` when GET arguments is missing.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/catap/spark ui_500

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4239


commit 4faba92526e93f44c11962724180e8e201015e7a
Author: Kirill A. Korinskiy 
Date:   2015-01-28T07:26:55Z

Don't return `ERROR 500` when have missing args

Spark web UI return `HTTP ERROR 500` when GET arguments is missing.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3732#discussion_r23669844
  
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
@@ -252,7 +252,7 @@ trait Row extends Serializable {
*
* @throws ClassCastException when data type does not match.
*/
-  def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date]
+  def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i))
--- End diff --

You can change the one in ScalaReflection.convertRowToScala to make it work 
for both scala and java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4238#issuecomment-71792089
  
  [Test build #26211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26211/consoleFull)
 for   PR 4238 at commit 
[`24ed322`](https://github.com/apache/spark/commit/24ed3223f96ec8a2c93fe01f51e846b3e8d92c54).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-27 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3732#discussion_r23669693
  
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala ---
@@ -252,7 +252,7 @@ trait Row extends Serializable {
*
* @throws ClassCastException when data type does not match.
*/
-  def getDate(i: Int): java.sql.Date = apply(i).asInstanceOf[java.sql.Date]
+  def getDate(i: Int): java.sql.Date = DateUtils.toJavaDate(getInt(i))
--- End diff --

Now I add the conversion in DataTypeConversion, which is only valid to java 
class. For scala, We need to write
Row(DateUtils.fromJavaDate(...)). Is this OK with you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...

2015-01-27 Thread davies

GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/4238

[SPARK-5395] [PySpark] fix python process leak while coalesce()

Currently, the Python process is released into pool only after the task had 
finished, it cause many process forked if coalesce() is called.

This PR will change it to release the process as soon as read all the data 
from it (finish the partition), then a process could be reused to process 
multiple partitions in a single task.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark py_leak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4238


commit 24ed3223f96ec8a2c93fe01f51e846b3e8d92c54
Author: Davies Liu 
Date:   2015-01-28T07:21:55Z

fix python process leak while coalesce()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71791681
  
  [Test build #26210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26210/consoleFull)
 for   PR 3732 at commit 
[`f0005b1`](https://github.com/apache/spark/commit/f0005b166a705f7b1c52960b72c4ff29d010e5ff).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...

2015-01-27 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3901#issuecomment-71789766
  
@JoshRosen Should we include this in 1.3?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3715#issuecomment-71789728
  
  [Test build #26209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26209/consoleFull)
 for   PR 3715 at commit 
[`23b039a`](https://github.com/apache/spark/commit/23b039a896497c8f4cae1bf963274ff295841c37).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23668817
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -374,49 +375,63 @@ private[spark] object PythonRDD extends Logging {
 // The right way to implement this would be to use TypeTags to get the 
full
 // type of T.  Since I don't want to introduce breaking changes 
throughout the
 // entire Spark API, I have to use this hacky approach:
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23668764
  
--- Diff: make-distribution.sh ---
@@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar 
"$DISTDIR/lib/"
 cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/"
+cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/"
--- End diff --

The motivation for the assembly jars is to simplify the process for Python 
programmers, can #4215  help in this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71788951
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26208/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71788946
  
  [Test build #26208 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26208/consoleFull)
 for   PR 3951 at commit 
[`6e4ead8`](https://github.com/apache/spark/commit/6e4ead88855170a13806274d3103cbc4bc2a8563).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TreeEnsembleModel(JavaModelWrapper):`
  * `class DecisionTreeModel(JavaModelWrapper):`
  * `class RandomForestModel(TreeEnsembleModel):`
  * `class GradientBoostedTreesModel(TreeEnsembleModel):`
  * `class GradientBoostedTrees(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5428]: Declare the 'assembly' module at...

2015-01-27 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/4232#issuecomment-71787984
  
I am curious as to what is the benefit of this change ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-71787785
  
  [Test build #26207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26207/consoleFull)
 for   PR 3222 at commit 
[`de47aaf`](https://github.com/apache/spark/commit/de47aafc5f721167d64ebc7b987b43375ef26798).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AdaGradUpdater(`
  * `class DBN(val stackedRBM: StackedRBM)`
  * `class MLP(`
  * `class MomentumUpdater(val momentum: Double) extends Updater `
  * `class RBM(`
  * `class StackedRBM(val innerRBMs: Array[RBM])`
  * `case class MinstItem(label: Int, data: Array[Int]) `
  * `class MinstDatasetReader(labelsFile: String, imagesFile: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-71787790
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26207/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-71786983
  
  [Test build #26206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26206/consoleFull)
 for   PR 3798 at commit 
[`19406cc`](https://github.com/apache/spark/commit/19406cce66672d74bd0b9c1d98cd8486c186f8ee).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KafkaCluster(val kafkaParams: Map[String, String]) extends 
Serializable `
  * `  case class LeaderOffset(host: String, port: Int, offset: Long)`
  * `class KafkaRDDPartition(`
  * `trait OffsetRange `
  * `trait HasOffsetRanges `
  * `  class DeterministicKafkaInputDStreamCheckpointData extends 
DStreamCheckpointData(this) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-71786989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26206/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4082#issuecomment-71786315
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26204/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4082#issuecomment-71786313
  
  [Test build #26204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26204/consoleFull)
 for   PR 4082 at commit 
[`a026ff2`](https://github.com/apache/spark/commit/a026ff236510c1ab242e71981102c7d0590c8dd6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SparkListenerExecutorAdded(time: Long, executorId: String, 
executorInfo: ExecutorInfo)`
  * `case class SparkListenerExecutorRemoved(time: Long, executorId: 
String, reason: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3988#issuecomment-71786254
  
  [Test build #26205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26205/consoleFull)
 for   PR 3988 at commit 
[`0f546e0`](https://github.com/apache/spark/commit/0f546e06fb8e5d4e5cf762fbc8d8cc7d11e1935f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3988#issuecomment-71786261
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26205/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/4068#discussion_r23667563
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -285,11 +285,22 @@ class Analyzer(catalog: Catalog,
 result
 
   // Resolve field names using the resolver.
-  case f @ GetField(child, fieldName) if !f.resolved && 
child.resolved =>
+  case f @ GetField(child, fieldName) if child.resolved =>
 child.dataType match {
   case StructType(fields) =>
-val resolvedFieldName = 
fields.map(_.name).find(resolver(_, fieldName))
-resolvedFieldName.map(n => f.copy(fieldName = 
n)).getOrElse(f)
+val actualField = fields.filter(f => resolver(f.name, 
fieldName))
+if (actualField.length == 0) {
+  sys.error(
+s"No such struct field $fieldName in 
${fields.map(_.name).mkString(", ")}")
--- End diff --

If we leave it unchanged, `CheckResolution` can't catch it. The reason is 
that, we need `Resolver` to check if a `GetField` is resolved, but we can't get 
`Resolver` inside `GetField`.
Fortunately, we can catch it at runtime, as `GetField` will report error if 
it can't find the required field.
Which way should we prefer? Leaving it unchanged or reporting error right 
away?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][SQL] DataFrame

2015-01-27 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/4173#discussion_r23667454
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -0,0 +1,606 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import scala.language.implicitConversions
+import scala.reflect.ClassTag
+import scala.collection.JavaConversions._
+
+import java.util.{ArrayList, List => JList}
+
+import com.fasterxml.jackson.core.JsonFactory
+import net.razorvine.pickle.Pickler
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.api.python.SerDeUtil
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.sql.catalyst.ScalaReflection
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.{Literal => LiteralExpr}
+import org.apache.spark.sql.catalyst.plans.{JoinType, Inner}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.execution.{LogicalRDD, EvaluatePython}
+import org.apache.spark.sql.json.JsonRDD
+import org.apache.spark.sql.types.{NumericType, StructType}
+import org.apache.spark.util.Utils
+
+
+/**
+ * A collection of rows that have the same columns.
+ *
+ * A [[DataFrame]] is equivalent to a relational table in Spark SQL, and 
can be created using
+ * various functions in [[SQLContext]].
+ * {{{
+ *   val people = sqlContext.parquetFile("...")
+ * }}}
+ *
+ * Once created, it can be manipulated using the various 
domain-specific-language (DSL) functions
+ * defined in: [[DataFrame]] (this class), [[Column]], and [[dsl]] for 
Scala DSL.
+ *
+ * To select a column from the data frame, use the apply method:
+ * {{{
+ *   val ageCol = people("age")  // in Scala
+ *   Column ageCol = people.apply("age")  // in Java
+ * }}}
+ *
+ * Note that the [[Column]] type can also be manipulated through its 
various functions.
+ * {{
+ *   // The following creates a new column that increases everybody's age 
by 10.
+ *   people("age") + 10  // in Scala
+ * }}
+ *
+ * A more concrete example:
+ * {{{
+ *   // To create DataFrame using SQLContext
+ *   val people = sqlContext.parquetFile("...")
+ *   val department = sqlContext.parquetFile("...")
+ *
+ *   people.filter("age" > 30)
+ * .join(department, people("deptId") === department("id"))
+ * .groupBy(department("name"), "gender")
+ * .agg(avg(people("salary")), max(people("age")))
+ * }}}
+ */
+// TODO: Improve documentation.
+class DataFrame protected[sql](
+val sqlContext: SQLContext,
+private val baseLogicalPlan: LogicalPlan,
+operatorsEnabled: Boolean)
+  extends DataFrameSpecificApi with RDDApi[Row] {
+
+  protected[sql] def this(sqlContext: Option[SQLContext], plan: 
Option[LogicalPlan]) =
+this(sqlContext.orNull, plan.orNull, sqlContext.isDefined && 
plan.isDefined)
+
+  protected[sql] def this(sqlContext: SQLContext, plan: LogicalPlan) = 
this(sqlContext, plan, true)
+
+  @transient protected[sql] lazy val queryExecution = 
sqlContext.executePlan(baseLogicalPlan)
+
+  @transient protected[sql] val logicalPlan: LogicalPlan = baseLogicalPlan 
match {
+// For various commands (like DDL) and queries with side effects, we 
force query optimization to
+// happen right away to let these side effects take place eagerly.
+case _: Command | _: InsertIntoTable | _: CreateTableAsSelect[_] |_: 
WriteToFile =>
+  LogicalRDD(queryExecution.analyzed.output, 
queryExecution.toRdd)(sqlContext)
+case _ =>
+  baseLogicalPlan
+  }
+
+  /**
+   * An implicit conversion function internal to this class for us to 
avoid doing
+   * "n

[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...

2015-01-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/4228#issuecomment-71785338
  
Not a very strong preference, but my take would be to keep them separate as 
you only want users to use `treeReduce` when they know they want an aggregation 
tree. Also the way `reduce` works right now is very familiar to existing users 
and it'll be better not to touch that or add extra options to it etc.

Also thanks @mengxr for pulling this out to core. I've definitely found 
this useful in many other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-5406][MLlib] LocalLAPACK mode in RowMat...

2015-01-27 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4200#issuecomment-71783950
  
1x1 is definitely doable with multi-threaded native BLAS on a 
single machine. But usually the full SVD is not necessary for the application. 
This is why I want to put a soft limit and throw a warning message, which might 
help users re-consider whether they need a full SVD.

That's interesting. Did GitHub show "added some commits 20 hours in the 
future"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71783971
  
  [Test build #26208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26208/consoleFull)
 for   PR 3951 at commit 
[`6e4ead8`](https://github.com/apache/spark/commit/6e4ead88855170a13806274d3103cbc4bc2a8563).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...

2015-01-27 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4228#issuecomment-71783095
  
I don't have strong preference. But note that `treeReduce` and `reduce` are 
quite different. `treeReduce` works better when there are large task results 
returned at around the same time (which is common for ML tasks), while `reduce` 
works better when there are many small task results returned in batches. If we 
put them together, users may think that better `depth` gives better 
scalability, which is not true. Again, no strong preference. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-71782811
  
  [Test build #26207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26207/consoleFull)
 for   PR 3222 at commit 
[`de47aaf`](https://github.com/apache/spark/commit/de47aafc5f721167d64ebc7b987b43375ef26798).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-71782133
  
  [Test build #26206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26206/consoleFull)
 for   PR 3798 at commit 
[`19406cc`](https://github.com/apache/spark/commit/19406cce66672d74bd0b9c1d98cd8486c186f8ee).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3988#issuecomment-71781548
  
  [Test build #26205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26205/consoleFull)
 for   PR 3988 at commit 
[`0f546e0`](https://github.com/apache/spark/commit/0f546e06fb8e5d4e5cf762fbc8d8cc7d11e1935f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5291][CORE] Add timestamp and reason wh...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4082#issuecomment-71781550
  
  [Test build #26204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26204/consoleFull)
 for   PR 4082 at commit 
[`a026ff2`](https://github.com/apache/spark/commit/a026ff236510c1ab242e71981102c7d0590c8dd6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-27 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4068#discussion_r23665528
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala
 ---
@@ -38,6 +38,15 @@ class HiveResolutionSuite extends HiveComparisonTest {
 sql("SELECT a[0].A.A from nested").queryExecution.analyzed
   }
 
+  test("SPARK-5278: check ambiguous reference to fields") {
+jsonRDD(sparkContext.makeRDD(
+  """{"a": [{"b": 1, "B": 2}]}""" :: Nil)).registerTempTable("nested")
+val exception = intercept[RuntimeException] {
+  println(sql("SELECT a[0].b from nested").queryExecution.analyzed)
--- End diff --

Can you add a comment at here to explain what we are expecting? Also, you 
can remove `println`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-27 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4068#discussion_r23665510
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -285,11 +285,22 @@ class Analyzer(catalog: Catalog,
 result
 
   // Resolve field names using the resolver.
-  case f @ GetField(child, fieldName) if !f.resolved && 
child.resolved =>
+  case f @ GetField(child, fieldName) if child.resolved =>
 child.dataType match {
   case StructType(fields) =>
-val resolvedFieldName = 
fields.map(_.name).find(resolver(_, fieldName))
-resolvedFieldName.map(n => f.copy(fieldName = 
n)).getOrElse(f)
+val actualField = fields.filter(f => resolver(f.name, 
fieldName))
+if (actualField.length == 0) {
+  sys.error(
+s"No such struct field $fieldName in 
${fields.map(_.name).mkString(", ")}")
--- End diff --

I think `CheckResolution` should catch it. If we cannot resolve it, just 
leave it unchanged. Can you see if there is a unit test for this? If not, can 
you add one? Maybe we can also log it like what `LogicalPlan.resolve` does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-27 Thread MechCoder

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-71779796
  
It also returns empty bins, just to be compatible with the present API. 
Hopefully that's not a problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-27 Thread MechCoder

Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/4231#discussion_r23665261
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/TreePoint.scala ---
@@ -96,14 +96,12 @@ private[tree] object TreePoint {
* Find bin for one (labeledPoint, feature).
*
* @param featureArity  0 for continuous features; number of categories 
for categorical features.
-   * @param isUnorderedFeature  (only applies if feature is categorical)
--- End diff --

@jkbradley I removed this param as it is unused. I don't think it is a 
problem since all tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread nightwolfzor

Github user nightwolfzor commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71779075
  
Any chance this one will make it into the 1.3 release? We'd really like to 
see this one! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5196][SQL] Support `comment` in Create ...

2015-01-27 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/3999#issuecomment-71778526
  
ping @marmbrus @yhuai 
I think this is ready to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4237#issuecomment-71777208
  
  [Test build #26203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26203/consoleFull)
 for   PR 4237 at commit 
[`0cdc8f8`](https://github.com/apache/spark/commit/0cdc8f87a02c5bf20f4f61a4dbd83d16431a1af9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4237#issuecomment-71777210
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26203/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...

2015-01-27 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3994#issuecomment-71776175
  
I'll also close this PR. I've misunderstood mesos #4170 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5198][Mesos] Change executorId more uni...

2015-01-27 Thread jongyoul

Github user jongyoul closed the pull request at:

https://github.com/apache/spark/pull/3994


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread jongyoul

Github user jongyoul closed the pull request at:

https://github.com/apache/spark/pull/4170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71776099
  
I'll close this PR. It's wrong approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71775404
  
@tnachen @mateiz So sorry for taking up a lot of time. I've found that only 
one executor as a process runs at any time, and I understand executor can have 
multiple tasks at the same time. I've believed each executor is launched 
separately when driver launchTasks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4586][MLLIB] Python API for ML pipeline...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4151#issuecomment-71774770
  
  [Test build #26202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26202/consoleFull)
 for   PR 4151 at commit 
[`fc59a02`](https://github.com/apache/spark/commit/fc59a022f767750e0b4796b83fa7f1da1e28fb5e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4586][MLLIB] Python API for ML pipeline...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4151#issuecomment-71774775
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26202/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-5406][MLlib] LocalLAPACK mode in RowMat...

2015-01-27 Thread hhbyyh

Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/4200#issuecomment-71774727
  
@mengxr Sorry I was on something else yesterday. 
Are you suggesting putting a soft limit  in the `auto` mode for local and 
keep the hard limit in case LocalLAPACK ?
I agree with the general idea. 
Just when I tried on my local machine, it took only about 2 hours to 
compute the full svd for a 10K * 10K matrix. And I even haven't install the 
NativeSystemBLAS. So I guess the limit for a single machine will be quite near 
the hard limit (17515). I'll try with distribute mode today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-27 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4220#issuecomment-71774621
  
LGTM. I was worried about `System.getProperty()`'s thread-safety, but I 
assume it's ultimately synchronized since the underlying store is a 
`Properties` object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4220#discussion_r23663471
  
--- Diff: core/src/test/scala/org/apache/spark/SparkConfSuite.scala ---
@@ -17,6 +17,10 @@
 
 package org.apache.spark
 
+import java.util.concurrent.{TimeUnit, Executors}
--- End diff --

ultra nit: sort imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1934 [CORE] "this" reference escape to "...

2015-01-27 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/4225#issuecomment-71774422
  
LGTM.

> @zsxwing also reported a similar problem in BlockManager in the JIRA, but 
I can't find a similar pattern there. Maybe it was subsequently fixed?

I checked the history. It's already fixed in #3087


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71774383
  
@jongyoul So an executor can only "launch" one task at a time, but can have 
multiple tasks running simultaneously as you mentioned.

It doesn't matter if they're all part of the same launchTasks message or 
seperate, as long as the framework and executor id are the same it will be 
launched in the same executor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71773877
  
@tnachen Yes, I fully understand reusing executor while a framework is 
alive. However, we launch two  task on a same executor? What you've answered is 
they are launched at the same time, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2015-01-27 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-71773844
  
No, I'm done with it. Thanks for taking a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71773647
  
If you read the fine-grained mode source code, you'll notice that Spark is 
using the slave id as the executor id, which is what we discussed on the mesos 
mailing list, that the executor will be re-used if all tasks reuse the same 
executor id.
Therefore, it's only launching one executor per slave, and if the executor 
dies Mesos will relaunch it when the task asks for it again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23663046
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -313,6 +313,7 @@ private object SpecialLengths {
   val PYTHON_EXCEPTION_THROWN = -2
   val TIMING_DATA = -3
   val END_OF_STREAM = -4
+  val NULL = -5
--- End diff --

@tdas I think that this same null-handling change has been proposed before 
but until now I don't think we had a great reason to pull it in, since none of 
our internal APIs relied on it and we were worried that it might mask the 
presence of bugs.  Now that we have a need for it, though, it might be okay to 
pull in here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71773477
  
I believed that when we launch mesos driver launchTasks, container run the 
command `bin/spark-class` everytime running task. And in my qna email for 
mesos, @tnachen answers that one container run multiple command simultaneously. 
And my some tests show two tasks runs simutaneously because they write a same 
log file at the same time. And my digging codes results no limit to launch task 
on a mesos container. However, @mateiz told me that one executor only runs a 
single JVM and launch a single task at any time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4237#issuecomment-71773354
  
  [Test build #26203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26203/consoleFull)
 for   PR 4237 at commit 
[`0cdc8f8`](https://github.com/apache/spark/commit/0cdc8f87a02c5bf20f4f61a4dbd83d16431a1af9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5440][pyspark] Add toLocalIterator to p...

2015-01-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4237#issuecomment-71773281
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5361]python tuple not supported while c...

2015-01-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4146#issuecomment-71773058
  
@wingchen Actually, just to be clear here, is this problem related to tuple 
handling, or is the actual issue related to multiple Java <-> Python 
conversions not working correctly?  If there's nothing tuple-specific about 
this, do you mind editing the PR title, description, and JIRA to reflect this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71772711
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26201/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71772703
  
  [Test build #26201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26201/consoleFull)
 for   PR 3951 at commit 
[`7dc1aab`](https://github.com/apache/spark/commit/7dc1aab286d47565b734b472623626b79417b442).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TreeEnsembleModel(JavaModelWrapper):`
  * `class DecisionTreeModel(JavaModelWrapper):`
  * `class RandomForestModel(TreeEnsembleModel):`
  * `class GradientBoostedTreesModel(TreeEnsembleModel):`
  * `class GradientBoostedTrees(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71772544
  
I don't know the behaviour in coarse-grained mode, but in fine-grained 
mode, we use multiple JVM for running tasks. we run spark-class by launcher. 
This means we launch JVM by running per task. Am I wrong? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4207#issuecomment-71772475
  
hi, @yanbohappy Thanks for working on this. But by the way, this 
JIRA(SPARK-5324) `Results of describe can't be queried` is mainly focus on 
`make describe command` `can be query like a table` but not `Add a describe 
command in sqlContext`, shall we make this PR focus on it's own JIRA issue? And 
you have no more test suites to `demonstrate bug fixing`. Would you mind close 
this PR, and if you have some good advices for `add describe table` you can 
refer to `SPARK-5135` #4227 , and comment on my PR. I'd be very pleasure : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23662537
  
--- Diff: make-distribution.sh ---
@@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar 
"$DISTDIR/lib/"
 cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/"
+cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/"
--- End diff --

I not inclined to block this PR on #4215 . We can make the doc fix 
separately later. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23662499
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -313,6 +313,7 @@ private object SpecialLengths {
   val PYTHON_EXCEPTION_THROWN = -2
   val TIMING_DATA = -3
   val END_OF_STREAM = -4
+  val NULL = -5
--- End diff --

So this patch tries to fix a bug in Python regarding null values? If so, 
that probably should be a different patch from this Kafka patch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23662457
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -374,49 +375,63 @@ private[spark] object PythonRDD extends Logging {
 // The right way to implement this would be to use TypeTags to get the 
full
 // type of T.  Since I don't want to introduce breaking changes 
throughout the
 // entire Spark API, I have to use this hacky approach:
+def write(bytes: Array[Byte]) {
+  if (bytes == null) {
+dataOut.writeInt(SpecialLengths.NULL)
+  } else {
+dataOut.writeInt(bytes.length)
+dataOut.write(bytes)
+  }
+}
+
+def writeS(str: String) {
+  if (str == null) {
+dataOut.writeInt(SpecialLengths.NULL)
+  } else {
+writeUTF(str, dataOut)
+  }
+}
+
 if (iter.hasNext) {
   val first = iter.next()
   val newIter = Seq(first).iterator ++ iter
   first match {
 case arr: Array[Byte] =>
-  newIter.asInstanceOf[Iterator[Array[Byte]]].foreach { bytes =>
-dataOut.writeInt(bytes.length)
-dataOut.write(bytes)
-  }
+  newIter.asInstanceOf[Iterator[Array[Byte]]].foreach(write)
 case string: String =>
-  newIter.asInstanceOf[Iterator[String]].foreach { str =>
-writeUTF(str, dataOut)
-  }
+  newIter.asInstanceOf[Iterator[String]].foreach(writeS)
 case stream: PortableDataStream =>
   newIter.asInstanceOf[Iterator[PortableDataStream]].foreach { 
stream =>
-val bytes = stream.toArray()
-dataOut.writeInt(bytes.length)
-dataOut.write(bytes)
+write(stream.toArray())
   }
 case (key: String, stream: PortableDataStream) =>
   newIter.asInstanceOf[Iterator[(String, 
PortableDataStream)]].foreach {
 case (key, stream) =>
-  writeUTF(key, dataOut)
-  val bytes = stream.toArray()
-  dataOut.writeInt(bytes.length)
-  dataOut.write(bytes)
+  writeS(key)
+  write(stream.toArray())
   }
 case (key: String, value: String) =>
   newIter.asInstanceOf[Iterator[(String, String)]].foreach {
 case (key, value) =>
-  writeUTF(key, dataOut)
-  writeUTF(value, dataOut)
+  writeS(key)
+  writeS(value)
   }
 case (key: Array[Byte], value: Array[Byte]) =>
   newIter.asInstanceOf[Iterator[(Array[Byte], 
Array[Byte])]].foreach {
 case (key, value) =>
-  dataOut.writeInt(key.length)
-  dataOut.write(key)
-  dataOut.writeInt(value.length)
-  dataOut.write(value)
+  write(key)
+  write(value)
   }
+// key is null
+case (null, v:Array[Byte]) =>
--- End diff --

nit: Also, for consistency with other "cases", v --> value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71771833
  
@jongyul sorry didn't get to finish reviewing the PR, and I agree with 
matei that in spark usage of mesos it doesn't make sense to give tasks memory, 
as we share the same executor that is kept running.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5361]python tuple not supported while c...

2015-01-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4146#issuecomment-71771191
  
Let me take one final look to see if I can pull this in for 1.2.1 (since 
we're cutting a new RC tonight).  In general, this looks safe since only adds 
new code paths in cases where we'd otherwise throw exception, as opposed to 
changing the behavior of existing code paths.  If things check out, I'll pull 
it in for both 1.3.0 and 1.2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5135][SQL] Add support for describe tab...

2015-01-27 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4227#issuecomment-71770856
  
yeah, @rxin, would you like to talk with @marmbrus for `what we'd like to 
show in describe extended table` in SQLContext and then file a `JIRA` issues?  
So that we can do it separately but not in this PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71770662
  
Right, as I said, it doesn't make sense to offer task memory twice. Each 
executor is a *single* JVM, and JVMs cannot scale their memory up and down. The 
executor's memory is set to the same value that we configure that JVM with, 
with `-Xmx`. There's no way to make tasks use more memory than that, no matter 
how many tasks are running on there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on a diff in the pull request:

https://github.com/apache/spark/pull/4207#discussion_r23661855
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -1034,4 +1034,11 @@ class SQLQuerySuite extends QueryTest with 
BeforeAndAfterAll {
 rdd.registerTempTable("distinctData")
 checkAnswer(sql("SELECT COUNT(DISTINCT key,value) FROM distinctData"), 
Row(2))
   }
+
+  test("describe table") {
+checkAnswer(sql("DESCRIBE EXTENDED testData"),Seq(
--- End diff --

EXTENDED ? 
This seems no different with describe.
I think we'd better do this later after the discusstion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on a diff in the pull request:

https://github.com/apache/spark/pull/4207#discussion_r23661781
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -1034,4 +1034,11 @@ class SQLQuerySuite extends QueryTest with 
BeforeAndAfterAll {
 rdd.registerTempTable("distinctData")
 checkAnswer(sql("SELECT COUNT(DISTINCT key,value) FROM distinctData"), 
Row(2))
   }
+
+  test("describe table") {
+checkAnswer(sql("DESCRIBE EXTENDED testData"),Seq(
+Row("key","IntegerType",null), Row("value","StringType",null)
--- End diff --

`IntegerType` and `StringType` ? should be `int, string`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4207#issuecomment-71770183
  
hi, @yanbohappy, I've already worked in this.
For your PR, I took a look at it. But I think it's a little hacky to me. 
Would you like to review mine?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5376][Mesos] MesosExecutor should have ...

2015-01-27 Thread jongyoul

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/4170#issuecomment-71770002
  
Sorry, I've shaw you my configuration. my configuraion is 5G for 
SPARK_EXECUTOR_MEMORY and 5 for spark.task.cpus. In my screenshot, we launch 
two tasks on the same machine. Don't you think It's good to offer task memory 
twice? My PR gives correct resource management information to mesos' master. 
For CPUs, I don't know proper value of executor cpus, but not CPUS_TASK_CPUS. 
Recommend this value, please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-5341] Use maven coordinates as dep...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4215#issuecomment-71769759
  
  [Test build #26200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26200/consoleFull)
 for   PR 4215 at commit 
[`3705907`](https://github.com/apache/spark/commit/3705907dc2f61fa68f64df14a23622cc40aff9d8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   * (4) the main class for the child`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-5341] Use maven coordinates as dep...

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4215#issuecomment-71769764
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26200/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23661437
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from py4j.java_collections import MapConverter
+from py4j.java_gateway import java_import, Py4JError
+
+from pyspark.storagelevel import StorageLevel
+from pyspark.serializers import PairDeserializer, NoOpSerializer
+from pyspark.streaming import DStream
+
+__all__ = ['KafkaUtils', 'utf8_decoder']
+
+
+def utf8_decoder(s):
+""" Decode the unicode as UTF-8 """
+return s and s.decode('utf-8')
+
+
+class KafkaUtils(object):
+
+@staticmethod
+def createStream(ssc, zkQuorum, groupId, topics,
+ storageLevel=StorageLevel.MEMORY_AND_DISK_SER_2,
+ keyDecoder=utf8_decoder, valueDecoder=utf8_decoder):
+"""
+Create an input stream that pulls messages from a Kafka Broker.
+
+:param ssc:  StreamingContext object
+:param zkQuorum:  Zookeeper quorum 
(hostname:port,hostname:port,..).
+:param groupId:  The group id for this consumer.
+:param topics:  Dict of (topic_name -> numPartitions) to consume.
+Each partition is consumed in its own thread.
+:param storageLevel:  RDD storage level.
+:param keyDecoder:  A function used to decode key
+:param valueDecoder:  A function used to decode value
+:return: A DStream object
+"""
+java_import(ssc._jvm, 
"org.apache.spark.streaming.kafka.KafkaUtils")
+
+param = {
+"zookeeper.connect": zkQuorum,
+"group.id": groupId,
+"zookeeper.connection.timeout.ms": "1",
+}
+if not isinstance(topics, dict):
+raise TypeError("topics should be dict")
+jtopics = MapConverter().convert(topics, 
ssc.sparkContext._gateway._gateway_client)
+jparam = MapConverter().convert(param, 
ssc.sparkContext._gateway._gateway_client)
+jlevel = ssc._sc._getJavaStorageLevel(storageLevel)
+
+def getClassByName(name):
+return ssc._jvm.org.apache.spark.util.Utils.classForName(name)
+
+try:
+array = getClassByName("[B")
+decoder = getClassByName("kafka.serializer.DefaultDecoder")
+jstream = ssc._jvm.KafkaUtils.createStream(ssc._jssc, array, 
array, decoder, decoder,
+   jparam, jtopics, 
jlevel)
+except Py4JError, e:
+# TODO: use --jar once it also work on driver
+if not e.message or 'call a package' in e.message:
--- End diff --

This is clever; the 'call a package' errors are _really_ confusing to 
users, so this message is pretty helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-71769526
  
  [Test build #26199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26199/consoleFull)
 for   PR 3658 at commit 
[`3c93e42`](https://github.com/apache/spark/commit/3c93e42a5e9474b33aa53f7fd6f22998d44a8c52).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5135][SQL] Add support for describe tab...

2015-01-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4227#issuecomment-71769478
  
Thanks for submitting the new version. Are these two PRs working on the 
same thing? https://github.com/apache/spark/pull/4207 Would be great if you two 
can chime in on each other's PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2015-01-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-71769535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26199/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5441][pyspark] Make SerDeUtil PairRDD t...

2015-01-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4236#issuecomment-71769501
  
Hey thanks for this - mind adding a regression test that fails on the old 
code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4207#issuecomment-71769449
  
Thanks for submitting the pull request. Are these two PRs working on the 
same thing? https://github.com/apache/spark/pull/4227 Would be great if you two 
can chime in on each other's PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3715#issuecomment-71769379
  
Should we have tests for this?  Do we have tests for the other Python 
streaming sources?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23661298
  
--- Diff: make-distribution.sh ---
@@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar 
"$DISTDIR/lib/"
 cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/"
+cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/"
--- End diff --

Rather than packaging this with the release, can we just ask users to add 
the maven coordinates when launching it. This will add a fairly large amount to 
the binary size of Spark (especially if we add other ones in the future).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-27 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23661317
  
--- Diff: make-distribution.sh ---
@@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar 
"$DISTDIR/lib/"
 cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/"
+cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/"
--- End diff --

I'm assuming that #4215 gets merged in all this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4586][MLLIB] Python API for ML pipeline...

2015-01-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4151#issuecomment-71769162
  
  [Test build #26202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26202/consoleFull)
 for   PR 4151 at commit 
[`fc59a02`](https://github.com/apache/spark/commit/fc59a022f767750e0b4796b83fa7f1da1e28fb5e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...

2015-01-27 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4228#discussion_r23661228
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -901,6 +901,38 @@ abstract class RDD[T: ClassTag](
   }
 
   /**
+   * Reduces the elements of this RDD in a multi-level tree pattern.
+   *
+   * @param depth suggested depth of the tree (default: 2)
+   * @see [[org.apache.spark.rdd.RDD#reduce]]
+   */
+  def treeReduce(f: (T, T) => T, depth: Int = 2): T = {
--- End diff --

Even in Scala we should avoid default arguments if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5097][SQL] Test cases for DataFrame exp...

2015-01-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4235


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 458 matches

Mail list logo