date:20150109

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22754263
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

supervisor is single-threaded , I don't think we have scenario where we 
update concurrently


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22754237
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

Why do you stop using AtomicInteger? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-69431702
  
  [Test build #25348 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25348/consoleFull)
 for   PR 3823 at commit 
[`133c43e`](https://github.com/apache/spark/commit/133c43e79482d2f88392dc287aa185564c2ed557).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-69431707
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25348/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3541][MLLIB] New ALS implementation wit...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3720#issuecomment-69432582
  
  [Test build #25353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25353/consoleFull)
 for   PR 3720 at commit 
[`dd0d0e8`](https://github.com/apache/spark/commit/dd0d0e8ecd36b9e607306dd170d1e22437180389).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4406] [MLib] FIX: Validate k in SVD

2015-01-09 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3945#issuecomment-69432542
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...

2015-01-09 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3923#discussion_r22755110
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/distribution/MultivariateGaussian.scala
 ---
@@ -15,11 +15,13 @@
  * limitations under the License.
  */
 
-package org.apache.spark.mllib.stat.impl
+package org.apache.spark.mllib.stat.distribution
 
 import breeze.linalg.{DenseVector = DBV, DenseMatrix = DBM, diag, max, 
eigSym}
 
+import org.apache.spark.mllib.linalg.{Vectors, Vector, Matrices, Matrix}
 import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.annotation.DeveloperApi;
--- End diff --

sort import alphabetically


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...

2015-01-09 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3923#discussion_r22755112
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/distribution/MultivariateGaussian.scala
 ---
@@ -15,11 +15,13 @@
  * limitations under the License.
  */
 
-package org.apache.spark.mllib.stat.impl
+package org.apache.spark.mllib.stat.distribution
 
 import breeze.linalg.{DenseVector = DBV, DenseMatrix = DBM, diag, max, 
eigSym}
 
+import org.apache.spark.mllib.linalg.{Vectors, Vector, Matrices, Matrix}
 import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.annotation.DeveloperApi;
 
 /**
  * This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
--- End diff --

Please add `:: DeveloperApi ::` before `This class ...`. We need it to 
generate the doc correctly. You can check other `@DeveloperApi` usage as 
examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3944#issuecomment-69440503
  
Merging in master  branch-1.2. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3987#discussion_r22753235
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala ---
@@ -53,7 +53,7 @@ import scala.language.implicitConversions
  *
  * @param functionClassName UDF class name
  */
-class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {
+case class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {
--- End diff --

Ah, I see. It's externalaizable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-09 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/3988

[SPARK-5188][BUILD] make-distribution.sh should support curl, not only wget 
to get Tachyon

When we use `make-distribution.sh` with `--with-tachyon` option, Tachyon 
will be downloaded by `wget` command but some systems don't have `wget` by 
default (MacOS X doesn't have).
Other scripts like build/mvn, build/sbt support not only `wget` but also 
`curl` so `make-distribution.sh` should support `curl` too.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-5188

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3988


commit 83b49b5e2def5df861c21cad1c6c72be3a460e09
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2015-01-10T00:51:17Z

Modified make-distribution.sh so that we use curl, not only wget to get 
tachyon




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3988#issuecomment-69427935
  
  [Test build #25349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25349/consoleFull)
 for   PR 3988 at commit 
[`83b49b5`](https://github.com/apache/spark/commit/83b49b5e2def5df861c21cad1c6c72be3a460e09).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2015-01-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-69429218
  
Ok I'm merging this into master since tests are irrelevant here thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-69431301
  
  [Test build #25347 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25347/consoleFull)
 for   PR 3823 at commit 
[`b1ab402`](https://github.com/apache/spark/commit/b1ab402a0a835a642c99064fc0fa3d4a320b8b94).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3823#issuecomment-69431314
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25347/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3431#discussion_r22754925
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -46,6 +46,33 @@ trait RelationProvider {
 
 /**
  * ::DeveloperApi::
+ * Implemented by objects that produce relations for a specific kind of 
data source.  When
+ * Spark SQL is given a DDL operation with
+ * 1. USING clause: to specify the implemented SchemaRelationProvider
+ * 2. User defined schema: users can define schema optionally when create 
table
+ *
+ * Users may specify the fully qualified class name of a given data 
source.  When that class is
+ * not found Spark SQL will append the class name `DefaultSource` to the 
path, allowing for
+ * less verbose invocation.  For example, 'org.apache.spark.sql.json' 
would resolve to the
+ * data source 'org.apache.spark.sql.json.DefaultSource'
+ *
+ * A new instance of this class with be instantiated each time a DDL call 
is made.
+ */
+@DeveloperApi
+trait SchemaRelationProvider {
+  /**
+   * Returns a new base relation with the given parameters and user 
defined schema.
+   * Note: the parameters' keywords are case insensitive and this 
insensitivity is enforced
+   * by the Map that is passed to the function.
+   */
+  def createRelation(
+  sqlContext: SQLContext,
+  parameters: Map[String, String],
+  schema: Option[StructType]): BaseRelation
--- End diff --

My initial idea is to compatible with the old traits, since we will have 
two traits i will fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3944#issuecomment-69435265
  
  [Test build #25352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25352/consoleFull)
 for   PR 3944 at commit 
[`b6d63d5`](https://github.com/apache/spark/commit/b6d63d5b91cc2e558ecd5b984d312aa0ee9d6f32).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `protected class CaseInsensitiveMap(map: Map[String, String]) extends 
Map[String, String] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3944#issuecomment-69435266
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25352/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3941#issuecomment-69435983
  
  [Test build #25356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25356/consoleFull)
 for   PR 3941 at commit 
[`343ae27`](https://github.com/apache/spark/commit/343ae27959bcccd20b7360c9a050eb297a181e14).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22755829
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

I think, it's not single-threaded. Multiple threads can access to 
Supervisor. Each thread couldn't access at a same time but it includes 
memory-visibility problem.
Or, how about marking those vals as volatile?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69436041
  
  [Test build #25354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull)
 for   PR 3431 at commit 
[`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait SchemaRelationProvider `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-09 Thread nchammas

Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3988#discussion_r22755950
  
--- Diff: build/mvn ---
@@ -48,11 +48,11 @@ install_app() {
 # check if we already have the tarball
 # check if we have curl installed
 # download application
-[ ! -f ${local_tarball} ]  [ -n `which curl 2/dev/null` ]  \
+[ ! -f ${local_tarball} ]  [ -n `type curl 2/dev/null` ]  \
--- End diff --

FWIW, the approach recommended in [this 
answer](http://stackoverflow.com/a/677212/877069), which I agree with, is to 
use `command -v`, though honestly one way or the other it doesn't seem like a 
big deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22756102
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

hmmm..because supervisor is implemented as an actor, n and hiccups are 
maintained as the state of the actor and are only accessed via the handler of 
the message... so...I don't think it can be accessed by multiple threads ...I 
missed something in the code?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-69440276
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25357/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-69440274
  
  [Test build #25357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25357/consoleFull)
 for   PR 3951 at commit 
[`a34bec5`](https://github.com/apache/spark/commit/a34bec5c0fec8416168836f58b98a6fa046c3a8d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class GradientBoostedTreesModel(JavaModelWrapper):`
  * `class GradientBoostedTrees(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3987#discussion_r22753114
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala ---
@@ -53,7 +53,7 @@ import scala.language.implicitConversions
  *
  * @param functionClassName UDF class name
  */
-class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {
+case class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {
--- End diff --

nit: should `functionClassName` be still `var`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3987#discussion_r22753194
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala ---
@@ -53,7 +53,7 @@ import scala.language.implicitConversions
  *
  * @param functionClassName UDF class name
  */
-class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {
+case class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {
--- End diff --

Yeah, its mutated below by our custom deserialization.
On Jan 9, 2015 4:31 PM, Kousuke Saruta notificati...@github.com wrote:

 In sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala
 https://github.com/apache/spark/pull/3987#discussion-diff-22753114:

  @@ -53,7 +53,7 @@ import scala.language.implicitConversions
*
* @param functionClassName UDF class name
*/
  -class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {
  +case class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable {

 nit: should functionClassName be still var?

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3987/files#r22753114.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69427688
  
I mean it's unexpected because they're different when they should be the 
same (in that case, the value of `SPARK_YARN_APP_NAME`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4990][Deploy]to find default properties...

2015-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3823


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69431870
  
Oh, I see. But after this patch `SPARK_YARN_APP_NAME` becomes useless. It 
will make behavior in client and cluster mode to be same.
Note that happens when we don't set app name in SparkConf. Otherwise it is 
a different issue  described in 
[SPARK-3678](https://issues.apache.org/jira/browse/SPARK-3678). Perhaps we 
should file another separate PR to solve that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69431972
  
  But after this patch SPARK_YARN_APP_NAME becomes useless

That's why we should not commit this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433129
  
@scwf I have done it and will have a PR to your branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433190
  
https://github.com/scwf/spark/pull/22



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-69437984
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25355/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-69437979
  
  [Test build #25355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25355/consoleFull)
 for   PR 3850 at commit 
[`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22756612
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

Correct, volatile is not necessary.  
https://groups.google.com/forum/#!msg/scalaz/kFnICLFjO-4/GT_59mZLrFAJ


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3944#issuecomment-69428905
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5006][Deploy]spark.port.maxRetries does...

2015-01-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3841#issuecomment-69431008
  
@andrewor14 Yeah it is an alternative. I will try it on Monday. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69431073
  
  [Test build #25351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25351/consoleFull)
 for   PR 3916 at commit 
[`fc6a3e2`](https://github.com/apache/spark/commit/fc6a3e2597220907602f320fd2aebe43564a7461).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3541][MLLIB] New ALS implementation wit...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3720#issuecomment-69434737
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25353/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69435868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25350/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69435864
  
  [Test build #25350 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25350/consoleFull)
 for   PR 3916 at commit 
[`f26556b`](https://github.com/apache/spark/commit/f26556b498cdae3fa23ea5837d673b4f5cb98c58).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] make-distribution.sh using build/mvn

2015-01-09 Thread witgo

Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/3867


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22756477
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

You try to log the current thread name in `receive` and then, you can see 
multiple threads access `receive`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22756617
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

I see, Akka's actor makes sure the visibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread luogankun

GitHub user luogankun reopened a pull request:

https://github.com/apache/spark/pull/3944

[SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableException

CaseInsensitiveMap throws java.io.NotSerializableException.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/luogankun/spark SPARK-5141

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3944.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3944


commit b6d63d5b91cc2e558ecd5b984d312aa0ee9d6f32
Author: luogankun luogan...@gmail.com
Date:   2015-01-08T08:19:23Z

[SPARK-5141]CaseInsensitiveMap throws java.io.NotSerializableException




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3987#issuecomment-69428763
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25346/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3987#issuecomment-69428754
  
  [Test build #25346 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25346/consoleFull)
 for   PR 3987 at commit 
[`8bca2fa`](https://github.com/apache/spark/commit/8bca2faccb53bc91cfc534f06fe8c0b25d6b4c61).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class HiveFunctionWrapper(functionClassName: String) extends 
java.io.Serializable `
  * `case class HiveFunctionWrapper(var functionClassName: String) extends 
java.io.Externalizable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69430581
  
  [Test build #25350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25350/consoleFull)
 for   PR 3916 at commit 
[`f26556b`](https://github.com/apache/spark/commit/f26556b498cdae3fa23ea5837d673b4f5cb98c58).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69432493
  
Or we could just make SPARK_YARN_APP_NAME disappear? As the env variable is 
not recommended and it will cause different behavior. User can still use 
`spark.app.name`.

Or we should make SPARK_YARN_APP_NAME and spark.app.name a special case in 
`YarnClientSchedulerBackend.scala` ?

What do you two think? @tgravescs @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4406] [MLib] FIX: Validate k in SVD

2015-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3945


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433261
  
ok, merged!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3988#issuecomment-69433277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25349/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4749] [mllib]: Allow initializing KMean...

2015-01-09 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3610#issuecomment-69433216
  
@nxwhite-str There are few minor comments left. Do you have time to update 
the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3988#issuecomment-69433274
  
  [Test build #25349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25349/consoleFull)
 for   PR 3988 at commit 
[`83b49b5`](https://github.com/apache/spark/commit/83b49b5e2def5df861c21cad1c6c72be3a460e09).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69433301
  
  [Test build #25354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25354/consoleFull)
 for   PR 3431 at commit 
[`7e79ce5`](https://github.com/apache/spark/commit/7e79ce5f80003fab657458cd9e79f4be85319aaa).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3541][MLLIB] New ALS implementation wit...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3720#issuecomment-69434732
  
  [Test build #25353 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25353/consoleFull)
 for   PR 3720 at commit 
[`dd0d0e8`](https://github.com/apache/spark/commit/dd0d0e8ecd36b9e607306dd170d1e22437180389).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class Rating(userId: Int, movieId: Int, rating: Float, 
timestamp: Long)`
  * `  case class Movie(movieId: Int, title: String, genres: Seq[String])`
  * `  case class Params(`
  * `class ALS extends Estimator[ALSModel] with ALSParams `
  * `  case class RatingBlock(srcIds: Array[Int], dstIds: Array[Int], 
ratings: Array[Float]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...

2015-01-09 Thread alexliu68

Github user alexliu68 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3941#discussion_r22755728
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -178,10 +178,23 @@ class SqlParser extends AbstractSparkSQLParser {
 joinedRelation | relationFactor
 
   protected lazy val relationFactor: Parser[LogicalPlan] =
-( ident ~ (opt(AS) ~ opt(ident)) ^^ {
-case tableName ~ alias = UnresolvedRelation(None, tableName, 
alias)
+(
+  ident ~ (. ~ ident)  ~ (. ~ ident) ~ (. ~ ident) ~ (opt(AS) 
~ opt(ident)) ^^ {
+case reserveName1 ~ reserveName2 ~ dbName ~ tableName ~ alias =
+  UnresolvedRelation(IndexedSeq(tableName, dbName, reserveName2, 
reserveName1), alias)
   }
-| (( ~ start ~ )) ~ (AS.? ~ ident) ^^ { case s ~ a = 
Subquery(a, s) }
+  | ident ~ (. ~ ident) ~ (. ~ ident) ~ (opt(AS) ~ opt(ident)) 
^^ {
+case reserveName1 ~ dbName ~ tableName ~ alias =
+  UnresolvedRelation(IndexedSeq(tableName, dbName, reserveName1), 
alias)
+  }
+  | ident ~ (. ~ ident) ~ (opt(AS) ~ opt(ident)) ^^ {
+  case dbName ~ tableName ~ alias =
+UnresolvedRelation(IndexedSeq(tableName, dbName), alias)
+}
+  | ident ~ (opt(AS) ~ opt(ident)) ^^ {
+  case tableName ~ alias = 
UnresolvedRelation(IndexedSeq(tableName), alias)
--- End diff --

I change it to rep1sep(ident, .)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...

2015-01-09 Thread alexliu68

Github user alexliu68 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3941#discussion_r22755736
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
 ---
@@ -115,43 +101,41 @@ class SimpleCatalog(val caseSensitive: Boolean) 
extends Catalog {
 trait OverrideCatalog extends Catalog {
 
   // TODO: This doesn't work when the database changes...
-  val overrides = new mutable.HashMap[(Option[String],String), 
LogicalPlan]()
+  val overrides = new mutable.HashMap[String, LogicalPlan]()
--- End diff --

restore it to (Option[String],String)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3944


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-09 Thread koeninger

Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-69446001
  
I went ahead and implemented locality and checkpointing of generated rdds.
Couple of points

- still depends on SPARK-4014 eventually being merged, for efficiency's
sake.

- I ran into classloader / class not found issues trying to checkpoint
KafkaRDDPartition directly.  Current solution is to transform them to/from
tuples, ugly but it works.  If you know what the issue is there, let me
know.

- I've got a use case that requires overriding the compute method on the
DStream (basically, modifying offsets to a fixed delay rather than now).
I'm assuming you'd prefer a user supplied function to do the transformation
rather than subclassing, but let me know.

On Mon, Jan 5, 2015 at 7:59 PM, Tathagata Das notificati...@github.com
wrote:

 Great! Keep me posted.

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3798#issuecomment-68815205.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2015-01-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-69426979
  
Oh gosh it is merged finally.
Thanks guys for persistent comments. @andrewor14 @tgravescs @vanzin @sryza 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3944#issuecomment-69432161
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4983]Tag EC2 instances in the same call...

2015-01-09 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3986#issuecomment-69437259
  
By the way, please also update the title of this PR to match the approach 
you are taking, since as you noted we can't actually use the same call to 
launch and tag instances. You can leave the JIRA tag at the beginning as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5168] Make SQLConf a field rather than ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3965#issuecomment-69445304
  
  [Test build #25358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25358/consoleFull)
 for   PR 3965 at commit 
[`42411e0`](https://github.com/apache/spark/commit/42411e002d729f855e33f0da61ab2bd4f0f65b24).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5168] Make SQLConf a field rather than ...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3965#issuecomment-69445305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25358/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69426504
  
@vanzin  Which one do you mean? Client mode or Cluster mode?
@tgravescs I looked the name on RM's UI.
I checked SPARK-3678 and realized that if no `spark.app.name` in 
configuration file or `--name` in command args, in cluster mode it will use 
`mainClass`.
But in client mode, cause usually we use `SparkConf.setAppName` in 
application code, so on RM's UI it will show what we set. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69430819
  
I understand all that. I'm saying that's unexpected, in that I'd expect 
both modes to behave the same. So if there's anything to fix here, that's it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69430615
  
I am afraid it is not.
In cluster mode, in `SparkSubmitArguments.scala` will be assigned with 
`mainClass` if `spark.app.name` or `--name` is not specified as it will not 
read `SPARK_YARN_APP_NAME`.
Then in `SparkSubmit.scala` it will pass `args.name` in format of 
`spark.app.name` and `--name`  to `org.apache.spark.deploy.yarn.Client`.

 `yarn.Client.scala` transform the args to a ClientArguments object, in 
which `appName` will only get  its value from `--name`.

So, in the progress, the app name would never get value from the env 
`SPARK_YARN_APP_NAME`. It is only used in client mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...

2015-01-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3557#issuecomment-69432664
  
Although in general we should honor Spark properties over environment 
variables, the app name has been a special case and should remain so for 
backward compatibility. For this PR, I think the goal is to maintain behavior 
in the before table by making more changes in `YarnClientSchedulerBackend`.

Additionally, it is not intuitive that if you set both 
`SPARK_YARN_APP_NAME` and `spark.app.name`, the behavior is inconsistent 
between client mode and cluster mode. I think the app name should be a special 
case for both deploy modes, but we can fix that in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3941#issuecomment-69439434
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25356/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4406] [MLib] FIX: Validate k in SVD

2015-01-09 Thread MechCoder

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/3945#issuecomment-69445130
  
@jkbradley @mengxr Thanks for the quick reviews and merge. Looking to 
contribute more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Remove permission for execution from s...

2015-01-09 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3983#issuecomment-69428110
  
BTW, I was testing some things on Windows the other day, and the 644 
permissions did turn out to be an issue. Probably because I was rsyncing the 
files from a Linux host and rsync would then translate the permissions to not 
allow them to be executable on the Windows side...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69435466
  
  [Test build #25351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25351/consoleFull)
 for   PR 3916 at commit 
[`fc6a3e2`](https://github.com/apache/spark/commit/fc6a3e2597220907602f320fd2aebe43564a7461).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69435468
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25351/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5188][BUILD] make-distribution.sh shoul...

2015-01-09 Thread nchammas

Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3988#discussion_r22755934
  
--- Diff: build/mvn ---
@@ -48,11 +48,11 @@ install_app() {
 # check if we already have the tarball
 # check if we have curl installed
 # download application
-[ ! -f ${local_tarball} ]  [ -n `which curl 2/dev/null` ]  \
+[ ! -f ${local_tarball} ]  [ -n `type curl 2/dev/null` ]  \
--- End diff --

Why are we replacing `which` with `type`? What's the difference between the 
two commands?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-69436744
  
  [Test build #25357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25357/consoleFull)
 for   PR 3951 at commit 
[`a34bec5`](https://github.com/apache/spark/commit/a34bec5c0fec8416168836f58b98a6fa046c3a8d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3941#issuecomment-69439432
  
  [Test build #25356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25356/consoleFull)
 for   PR 3941 at commit 
[`343ae27`](https://github.com/apache/spark/commit/343ae27959bcccd20b7360c9a050eb297a181e14).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22756597
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

Hi, @sarutak , I went back to Akka's document 
http://doc.akka.io/docs/akka/snapshot/general/jmm.html (Actors and the Java 
Memory Model), I think they stated that, 

 internal fields of the actor are visible when the next message is 
processed by that actor. So fields in your actor need not be volatile or 
equivalent. 

So, we don't need to explicitly mark these variables to be volatile?





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3944#issuecomment-69432309
  
  [Test build #25352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25352/consoleFull)
 for   PR 3944 at commit 
[`b6d63d5`](https://github.com/apache/spark/commit/b6d63d5b91cc2e558ecd5b984d312aa0ee9d6f32).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...

2015-01-09 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3923#discussion_r22755113
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/distribution/MultivariateGaussian.scala
 ---
@@ -30,33 +32,68 @@ import org.apache.spark.mllib.util.MLUtils
  * @param mu The mean vector of the distribution
  * @param sigma The covariance matrix of the distribution
  */
-private[mllib] class MultivariateGaussian(
-val mu: DBV[Double], 
-val sigma: DBM[Double]) extends Serializable {
+@DeveloperApi
+class MultivariateGaussian private[mllib] (
+private[mllib] val mu: DBV[Double], 
--- End diff --

Instead of having `mu`/`sigma` private and add getters, could we make them 
MLlib vector/matrix types and add private members of breeze types? Then we can 
make this constructor public and remove getters. The overhead is little because 
we don't copy the data arrays.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5018 [MLlib] [WIP] Make MultivariateGaus...

2015-01-09 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3923#issuecomment-69433464
  
@tgaloppo Besides inline comments, please resolve conflicts with the master 
branch. The patch does not merge cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-69434441
  
  [Test build #25355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25355/consoleFull)
 for   PR 3850 at commit 
[`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2015-01-09 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-69434332
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4574][SQL] Adding support for defining ...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3431#issuecomment-69436044
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25354/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5141][SQL]CaseInsensitiveMap throws jav...

2015-01-09 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3944#issuecomment-69438903
  
oh, since users may pass the ```CaseInsensitiveMap``` into scan builder 
relation,  make it ```Serializable``` more robust. this LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/3984#discussion_r22756565
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala
 ---
@@ -149,43 +181,61 @@ private[streaming] class ActorReceiver[T: ClassTag](
   class Supervisor extends Actor {
 
 override val supervisorStrategy = receiverSupervisorStrategy
-val worker = context.actorOf(props, name)
-logInfo(Started receiver worker at: + worker.path)
-
-val n: AtomicInteger = new AtomicInteger(0)
-val hiccups: AtomicInteger = new AtomicInteger(0)
-
--- End diff --

I see what you meanyes, you're correct, since the running thread of the 
actor can be changed before the updated value is written back to the memory


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5168] Make SQLConf a field rather than ...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3965#issuecomment-69442589
  
  [Test build #25358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25358/consoleFull)
 for   PR 3965 at commit 
[`42411e0`](https://github.com/apache/spark/commit/42411e002d729f855e33f0da61ab2bd4f0f65b24).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4983]Tag EC2 instances in the same call...

2015-01-09 Thread GenTang

Github user GenTang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3986#discussion_r22751730
  
--- Diff: ec2/spark_ec2.py ---
@@ -569,15 +569,28 @@ def launch_cluster(conn, opts, cluster_name):
 master_nodes = master_res.instances
 print Launched master in %s, regid = %s % (zone, master_res.id)
 
-# Give the instances descriptive names
+# Give the instances descriptive names.
+# The code of handling exceptions corresponds to issue [SPARK-4983]
 for master in master_nodes:
-master.add_tag(
-key='Name',
-value='{cn}-master-{iid}'.format(cn=cluster_name, 
iid=master.id))
+while True:
+try:
+master.add_tag(
+key='Name',
+value='{cn}-master-{iid}'.format(cn=cluster_name, 
iid=master.id))
+except:
+pass
--- End diff --

I think that it takes some time for EC2 to return an instance not existing 
exception . That's why I leave pass in the exception. However, Maybe we should 
add a small wait time to ensure that we don't submit too much requests to ec2

Yes, Here we just want to catch the exception of instance not existing. You 
are right, it is better to use specific exception. I will work on this.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...

2015-01-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3629#issuecomment-69422290
  
I see, thanks for your detailed explanations @suyanNone @liyezhang556520. 
If the problem is that we double count after we put the block in memory, 
shouldn't we also release the pending memory *after* we actually put the block 
(i.e. after [this 
line](https://github.com/apache/spark/blob/4e1f12d997426560226648d62ee17c90352613e7/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L344)),
 not before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-09 Thread tomerk

Github user tomerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r22752063
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala 
---
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.SparkContext._
+import org.apache.spark.ml.classification.{Classifier, ClassifierParams, 
ClassificationModel}
+import org.apache.spark.ml.param.{Params, IntParam, ParamMap}
+import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors, VectorUDT}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.sql.{DataType, SchemaRDD, Row, SQLContext}
+
+/**
+ * A simple example demonstrating how to write your own learning algorithm 
using Estimator,
+ * Transformer, and other abstractions.
+ * This mimics [[org.apache.spark.ml.classification.LogisticRegression]].
+ * Run with
+ * {{{
+ * bin/run-example ml.DeveloperApiExample
+ * }}}
+ */
+object DeveloperApiExample {
+
+  def main(args: Array[String]) {
+val conf = new SparkConf().setAppName(DeveloperApiExample)
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext._
+
+// Prepare training data.
+val training = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5
+
+// Create a LogisticRegression instance.  This instance is an 
Estimator.
+val lr = new MyLogisticRegression()
+// Print out the parameters, documentation, and any default values.
+println(MyLogisticRegression parameters:\n + lr.explainParams() + 
\n)
+
+// We may set parameters using setter methods.
+lr.setMaxIter(10)
+
+// Learn a LogisticRegression model.  This uses the parameters stored 
in lr.
+val model = lr.fit(training)
+
+// Prepare test data.
+val test = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)),
+  LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5
+
+// Make predictions on test data.
+val sumPredictions: Double = model.transform(test)
+  .select('features, 'label, 'prediction)
+  .collect()
+  .map { case Row(features: Vector, label: Double, prediction: Double) 
=
+prediction
+  }.sum
+assert(sumPredictions == 0.0,
+  MyLogisticRegression predicted something other than 0, even though 
all weights are 0!)
+  }
+}
+
+/**
+ * Example of defining a parameter trait for a user-defined type of 
[[Classifier]].
+ *
+ * NOTE: This is private since it is an example.  In practice, you may not 
want it to be private.
+ */
+private trait MyLogisticRegressionParams extends ClassifierParams {
+
+  /** param for max number of iterations */
+  val maxIter: IntParam = new IntParam(this, maxIter, max number of 
iterations)
+  def getMaxIter: Int = get(maxIter)
+}
+
+/**
+ * Example of defining a type of [[Classifier]].
+ *
+ * NOTE: This is private since it is an example.  In practice, you may not 
want it to be private.
+ */
+private class MyLogisticRegression
+  extends Classifier[Vector, MyLogisticRegression, 
MyLogisticRegressionModel]
+  with MyLogisticRegressionParams {
+
+  setMaxIter(100) // Initialize
+
+  def setMaxIter(value: Int): this.type = set(maxIter, value)
+
+  override def fit(dataset: SchemaRDD, paramMap: ParamMap): 
MyLogisticRegressionModel = {
+// Check schema (types). This allows early failure before running the 
algorithm.
+

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3987#issuecomment-69422858
  
  [Test build #25346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25346/consoleFull)
 for   PR 3987 at commit 
[`8bca2fa`](https://github.com/apache/spark/commit/8bca2faccb53bc91cfc534f06fe8c0b25d6b4c61).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-09 Thread tomerk

Github user tomerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r22752140
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala 
---
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.SparkContext._
+import org.apache.spark.ml.classification.{Classifier, ClassifierParams, 
ClassificationModel}
+import org.apache.spark.ml.param.{Params, IntParam, ParamMap}
+import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors, VectorUDT}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.sql.{DataType, SchemaRDD, Row, SQLContext}
+
+/**
+ * A simple example demonstrating how to write your own learning algorithm 
using Estimator,
+ * Transformer, and other abstractions.
+ * This mimics [[org.apache.spark.ml.classification.LogisticRegression]].
+ * Run with
+ * {{{
+ * bin/run-example ml.DeveloperApiExample
+ * }}}
+ */
+object DeveloperApiExample {
+
+  def main(args: Array[String]) {
+val conf = new SparkConf().setAppName(DeveloperApiExample)
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext._
+
+// Prepare training data.
+val training = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5
+
+// Create a LogisticRegression instance.  This instance is an 
Estimator.
+val lr = new MyLogisticRegression()
+// Print out the parameters, documentation, and any default values.
+println(MyLogisticRegression parameters:\n + lr.explainParams() + 
\n)
+
+// We may set parameters using setter methods.
+lr.setMaxIter(10)
+
+// Learn a LogisticRegression model.  This uses the parameters stored 
in lr.
+val model = lr.fit(training)
+
+// Prepare test data.
+val test = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)),
+  LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5
+
+// Make predictions on test data.
+val sumPredictions: Double = model.transform(test)
+  .select('features, 'label, 'prediction)
+  .collect()
+  .map { case Row(features: Vector, label: Double, prediction: Double) 
=
+prediction
+  }.sum
+assert(sumPredictions == 0.0,
+  MyLogisticRegression predicted something other than 0, even though 
all weights are 0!)
+  }
+}
+
+/**
+ * Example of defining a parameter trait for a user-defined type of 
[[Classifier]].
+ *
+ * NOTE: This is private since it is an example.  In practice, you may not 
want it to be private.
+ */
+private trait MyLogisticRegressionParams extends ClassifierParams {
+
+  /** param for max number of iterations */
+  val maxIter: IntParam = new IntParam(this, maxIter, max number of 
iterations)
+  def getMaxIter: Int = get(maxIter)
+}
+
+/**
+ * Example of defining a type of [[Classifier]].
+ *
+ * NOTE: This is private since it is an example.  In practice, you may not 
want it to be private.
+ */
+private class MyLogisticRegression
+  extends Classifier[Vector, MyLogisticRegression, 
MyLogisticRegressionModel]
+  with MyLogisticRegressionParams {
+
+  setMaxIter(100) // Initialize
+
+  def setMaxIter(value: Int): this.type = set(maxIter, value)
+
+  override def fit(dataset: SchemaRDD, paramMap: ParamMap): 
MyLogisticRegressionModel = {
+// Check schema (types). This allows early failure before running the 
algorithm.
+

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/3987

[SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark hiveUdfCaching

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3987.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3987


commit 8bca2faccb53bc91cfc534f06fe8c0b25d6b4c61
Author: Michael Armbrust mich...@databricks.com
Date:   2015-01-09T23:54:18Z

[SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...

2015-01-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3629#issuecomment-69423665
  
Also, the other issue with this patch is that `unrollSafely` is not used 
exclusively with `tryToPut`; it is also used in 
`CacheManager#putInBlockManager`. If we acquire pending memory in 
`unrollSafely` and expect `tryToPut` to release it later, then we will never 
release the pending memory in the `CacheManager` case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-09 Thread tomerk

Github user tomerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r22752339
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -80,69 +50,157 @@ class LogisticRegression extends 
Estimator[LogisticRegressionModel] with Logisti
 
   def setRegParam(value: Double): this.type = set(regParam, value)
   def setMaxIter(value: Int): this.type = set(maxIter, value)
-  def setLabelCol(value: String): this.type = set(labelCol, value)
   def setThreshold(value: Double): this.type = set(threshold, value)
-  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
-  def setScoreCol(value: String): this.type = set(scoreCol, value)
-  def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
   override def fit(dataset: SchemaRDD, paramMap: ParamMap): 
LogisticRegressionModel = {
+// Check schema
 transformSchema(dataset.schema, paramMap, logging = true)
-import dataset.sqlContext._
+
+// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+val oldDataset = extractLabeledPoints(dataset, paramMap)
 val map = this.paramMap ++ paramMap
-val instances = dataset.select(map(labelCol).attr, 
map(featuresCol).attr)
-  .map { case Row(label: Double, features: Vector) =
-LabeledPoint(label, features)
-  }.persist(StorageLevel.MEMORY_AND_DISK)
+val handlePersistence = dataset.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  oldDataset.persist(StorageLevel.MEMORY_AND_DISK)
+}
+
+// Train model
 val lr = new LogisticRegressionWithLBFGS
 lr.optimizer
   .setRegParam(map(regParam))
   .setNumIterations(map(maxIter))
-val lrm = new LogisticRegressionModel(this, map, 
lr.run(instances).weights)
-instances.unpersist()
+val oldModel = lr.run(oldDataset)
+val lrm = new LogisticRegressionModel(this, map, oldModel.weights, 
oldModel.intercept)
+
+if (handlePersistence) {
+  oldDataset.unpersist()
+}
+
 // copy model params
 Params.inheritValues(map, this, lrm)
 lrm
   }
 
-  private[ml] override def transformSchema(schema: StructType, paramMap: 
ParamMap): StructType = {
-validateAndTransformSchema(schema, paramMap, fitting = true)
-  }
+  override protected def featuresDataType: DataType = new VectorUDT
 }
 
+
 /**
  * :: AlphaComponent ::
+ *
  * Model produced by [[LogisticRegression]].
  */
 @AlphaComponent
 class LogisticRegressionModel private[ml] (
 override val parent: LogisticRegression,
--- End diff --

Why do models need to have a reference to the Estimator that produced them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5187][SQL] Fix caching of tables with H...

2015-01-09 Thread cfregly

Github user cfregly commented on the pull request:

https://github.com/apache/spark/pull/3987#issuecomment-69423882
  
lgtm.  as we just discussed, this is the same code path as 
SchemaRDD.cache(), so no need for additional tests.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-3490] Disable SparkUI for tests (backpo...

2015-01-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3959#issuecomment-69424023
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25345/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-3490] Disable SparkUI for tests (backpo...

2015-01-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3959#issuecomment-69424015
  
  [Test build #25345 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25345/consoleFull)
 for   PR 3959 at commit 
[`5425314`](https://github.com/apache/spark/commit/542531483312b77ed941c277f3e05c4ef1867534).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-09 Thread tomerk

Github user tomerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r22752722
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import org.apache.spark.annotation.{AlphaComponent, DeveloperApi}
+import org.apache.spark.ml.param.{HasProbabilityCol, ParamMap, Params}
+import org.apache.spark.mllib.linalg.{Vector, VectorUDT}
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.analysis.Star
+
+/**
+ * Params for probabilistic classification.
+ */
+private[classification] trait ProbabilisticClassifierParams
+  extends ClassifierParams with HasProbabilityCol {
+
+  override protected def validateAndTransformSchema(
+  schema: StructType,
+  paramMap: ParamMap,
+  fitting: Boolean,
+  featuresDataType: DataType): StructType = {
+val parentSchema = super.validateAndTransformSchema(schema, paramMap, 
fitting, featuresDataType)
+val map = this.paramMap ++ paramMap
+addOutputColumn(parentSchema, map(probabilityCol), new VectorUDT)
+  }
+}
+
+
+/**
+ * :: AlphaComponent ::
+ *
+ * Single-label binary or multiclass classifier which can output class 
conditional probabilities.
+ *
+ * @tparam FeaturesType  Type of input features.  E.g., [[Vector]]
+ * @tparam Learner  Concrete Estimator type
+ * @tparam M  Concrete Model type
+ */
+@AlphaComponent
+abstract class ProbabilisticClassifier[
+FeaturesType,
+Learner : ProbabilisticClassifier[FeaturesType, Learner, M],
+M : ProbabilisticClassificationModel[FeaturesType, M]]
+  extends Classifier[FeaturesType, Learner, M] with 
ProbabilisticClassifierParams {
+
+  def setProbabilityCol(value: String): Learner = set(probabilityCol, 
value).asInstanceOf[Learner]
+}
+
+
+/**
+ * :: AlphaComponent ::
+ *
+ * Model produced by a [[ProbabilisticClassifier]].
+ * Classes are indexed {0, 1, ..., numClasses - 1}.
+ *
+ * @tparam FeaturesType  Type of input features.  E.g., [[Vector]]
+ * @tparam M  Concrete Model type
+ */
+@AlphaComponent
+abstract class ProbabilisticClassificationModel[
+FeaturesType,
+M : ProbabilisticClassificationModel[FeaturesType, M]]
+  extends ClassificationModel[FeaturesType, M] with 
ProbabilisticClassifierParams {
+
+  def setProbabilityCol(value: String): M = set(probabilityCol, 
value).asInstanceOf[M]
+
+  /**
+   * Transforms dataset by reading from [[featuresCol]], and appending new 
columns as specified by
+   * parameters:
+   *  - predicted labels as [[predictionCol]] of type [[Double]]
+   *  - raw predictions (confidences) as [[rawPredictionCol]] of type 
[[Vector]]
+   *  - probability of each class as [[probabilityCol]] of type [[Vector]].
+   *
+   * @param dataset input dataset
+   * @param paramMap additional parameters, overwrite embedded params
+   * @return transformed dataset
+   */
+  override def transform(dataset: SchemaRDD, paramMap: ParamMap): 
SchemaRDD = {
+// This default implementation should be overridden as needed.
+import dataset.sqlContext._
+import org.apache.spark.sql.catalyst.dsl._
+
+// Check schema
+transformSchema(dataset.schema, paramMap, logging = true)
+val map = this.paramMap ++ paramMap
+
+// Prepare model
+val tmpModel = if (paramMap.size != 0) {
+  val tmpModel = this.copy()
+  Params.inheritValues(paramMap, parent, tmpModel)
+  tmpModel
+} else {
+  this
+}
+
+val (numColsOutput, outputData) =
+  ClassificationModel.transformColumnsImpl[FeaturesType](dataset, 
tmpModel, map)
+
+// Output selected columns only.
+if (map(probabilityCol) != ) {
+  //

1 2 3 4 >

1 - 100 of 396 matches

Mail list logo