date:20160201

[GitHub] spark pull request: [SPARK-12330] [MESOS] Fix mesos coarse mode cl...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10319#discussion_r51470346
  
--- Diff: docs/running-on-mesos.md ---
@@ -387,6 +387,13 @@ See the [configuration page](configuration.html) for 
information on Spark config
 
   
 
+
+  spark.mesos.coarse.shutdown.ms
+  1 (10 seconds)
+   
+Time (in ms) to wait for executors to report that they have exited. 
Setting this too low has the risk of shutting down the Mesos driver (and 
thereby killing the spark executors) while the executor is still in the process 
of exiting cleanly.
--- End diff --

documentation needs to be updated accordingly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12330] [MESOS] Fix mesos coarse mode cl...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10319#discussion_r51470296
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -60,6 +62,12 @@ private[spark] class CoarseMesosSchedulerBackend(
   // Maximum number of cores to acquire (TODO: we'll need more flexible 
controls here)
   val maxCores = conf.get("spark.cores.max", Int.MaxValue.toString).toInt
 
+  private[this] val shutdownTimeoutMS = 
conf.getInt("spark.mesos.coarse.shutdown.ms", 1)
--- End diff --

this needs to be
```
conf.getTimeAsMillis("spark.mesos.coarse.shutdownTimeout", "10s")
```
we have a consistent time string format for accepting similar configs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10605][SQL] Add structs to collect_list...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11004#issuecomment-178169233
  
**[Test build #50493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50493/consoleFull)**
 for PR 11004 at commit 
[`326a213`](https://github.com/apache/spark/commit/326a213dc014403aef1033e9d39206f5a873b7a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-02-01 Thread nongli

Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-178179456
  
Can you include the generated code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12463][SPARK-12464][SPARK-12465][SPARK-...

2016-02-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10057


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51481347
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
 ---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import scala.collection.mutable
+
+/**
+ * A helper class that looks like a Map[Source, Offset].
+ */
+class StreamProgress extends Serializable {
+  private val currentOffsets = new mutable.HashMap[Source, Offset]
--- End diff --

I think we should not make `StreamProgress` extend `Serializable` since we 
cannot recover it using Java serialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/11006

[SPARK-10820][SQL] Support for the continuous execution of structured 
queries

This is a follow up to 9aadcffabd226557174f3ff566927f873c71672e that 
extends Spark SQL to allow users to _repeatedly_ optimize and execute 
structured queries.  A `ContinuousQuery` can be expressed using SQL, DataFrames 
or Datasets.  The purpose of this PR is only to add some initial infrastructure 
which will be extended in subsequent PRs.

## User-facing API

- `sqlContext.streamFrom` and `df.streamTo` return builder objects that are 
analogous to the `read/write` interfaces already available to executing queries 
in a batch-oriented fashion.
- `ContinuousQuery` provides an interface for interacting with a query that 
is currently executing in the background.

## Internal Interfaces
 - `StreamExecution` - executes streaming queries in micro-batches

The following are currently internal, but public APIs will be provided in a 
future release.
 - `Source` - an interface for providers of continually arriving data.  A 
source must have a notion of an `Offset` that monotonically tracks what data 
has arrived.  For fault tolerance, a source must be able to replay data given a 
start offset.
 - `Sink` - an interface that accepts the results of a continuously 
executing query.  Also responsible for tracking the offset that should be 
resumed from in the case of a failure.

## Testing
 - `MemoryStream` and `MemorySink` - simple implementations of source and 
sink that keep all data in memory and have methods for simulating durability 
failures
 - `StreamTest` - a framework for performing actions and checking 
invariants on a continuous query

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark structured-streaming

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11006.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11006


commit e238911f494c2db41e36c9cde9dd0b733630b36f
Author: Michael Armbrust 
Date:   2015-12-10T02:51:06Z

first draft

commit d2706b511f254feff7d4558550e403215ffb795f
Author: Michael Armbrust 
Date:   2015-12-11T07:49:33Z

working on state

commit 7a3590fe5dd659d1a47e645eec247e21280a3373
Author: Michael Armbrust 
Date:   2015-12-11T19:26:40Z

working on stateful streaming

commit c8a923831bbfc24f71eb744e36a15e432d6ae067
Author: Michael Armbrust 
Date:   2015-12-12T00:34:19Z

now with event time windows

commit 192b7fb9efc59f22009adf61a432b5bafc19
Author: Michael Armbrust 
Date:   2015-12-13T03:16:48Z

some refactoring after talking to ali

commit a0a1e7bbd85d0de3f6d90793b894a8b00f86a7f0
Author: Michael Armbrust 
Date:   2015-12-15T00:47:55Z

docs

commit 15bed31800319547eb616c0cc22c564bf967b7f1
Author: Michael Armbrust 
Date:   2015-12-15T18:04:42Z

start kinesis

commit 89464a91e3954f30b68a1f633e2b9c062e08fa2c
Author: Michael Armbrust 
Date:   2015-12-17T02:18:35Z

some renaming

commit d133dbbed3d3ff7146129cd6532b1f26f2201315
Author: Michael Armbrust 
Date:   2015-12-28T00:01:57Z

WIP: file source

commit e3c4c8301fdcfaaa0bd56ed92a81e4e1d2db64a8
Author: Michael Armbrust 
Date:   2016-01-05T05:39:40Z

Merge remote-tracking branch 'origin/master' into streaming-infra

Conflicts:
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

commit 90fa6d30f0bf70a7c0c4bfe0889a07cbdddbf203
Author: Michael Armbrust 
Date:   2016-01-05T05:52:43Z

remove half-baked stateful implementation

commit b1c1dc6ead2c76589e48f4c28ef02e6c4361e549
Author: Michael Armbrust 
Date:   2016-01-05T06:36:46Z

cleanup

commit 92050688699b4c509f5ec5c2b636797cb5ae3c89
Author: Michael Armbrust 
Date:   2016-01-05T23:32:34Z

more docs

commit 7a59f00b4f0d238c33d3250fcfe162d7c2c27c18
Author: Josh Rosen 
Date:   2016-01-06T00:00:48Z

Add circle.yml file.

commit eab186d5780b2946a32041ca17d80323b052efc6
Author: Michael Armbrust 
Date:   2016-01-06T00:48:42Z

rollback changes

commit 20750630bd50bf7d0abad4a23b2a717705115e46
Author: Josh Rosen 
Date:   2016-01-06T00:56:16Z

Try using cached resolution to speed up compilation

commit dabc102271e09bae2ddca1366deb13b20c10ded3
Author: Josh Rosen 
Date:   2016-01-06T00:59:10Z

Use assembly/assembly.

commit

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-178133344
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2016-02-01 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/6935#issuecomment-178139026
  
jenkins, add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-02-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10678


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51480080
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
 ---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import scala.collection.mutable
+
+/**
+ * A helper class that looks like a Map[Source, Offset].
+ */
+class StreamProgress extends Serializable {
+  private val currentOffsets = new mutable.HashMap[Source, Offset]
+with mutable.SynchronizedMap[Source, Offset]
+
+  private[streaming] def update(source: Source, newOffset: Offset): Unit = 
{
+currentOffsets.get(source).foreach(old =>
+  assert(newOffset > old, s"Stream going backwards $newOffset -> 
$old"))
+currentOffsets.put(source, newOffset)
+  }
+
+  private[streaming] def update(newOffset: (Source, Offset)): Unit =
+update(newOffset._1, newOffset._2)
+
+  private[streaming] def apply(source: Source): Offset = 
currentOffsets(source)
+  private[streaming] def get(source: Source): Option[Offset] = 
currentOffsets.get(source)
+  private[streaming] def contains(source: Source): Boolean = 
currentOffsets.contains(source)
+
+  private[streaming] def ++(updates: Map[Source, Offset]): StreamProgress 
= {
+val updated = new StreamProgress
+currentOffsets.foreach(updated.update)
+updates.foreach(updated.update)
+updated
+  }
+
+  /**
+   * Used to create a new copy of this [[StreamProgress]]. While this 
class is currently mutable,
+   * it should be copied before being passed to user code.
+   */
+  private[streaming] def copy(): StreamProgress = {
+val copied = new StreamProgress
+currentOffsets.foreach(copied.update)
+copied
+  }
+
+  override def toString: String =
+currentOffsets.map { case (k, v) => s"$k: $v"}.mkString("{", ",", "}")
+
+  override def equals(other: Any): Boolean = other match {
+case s: StreamProgress =>
+  s.currentOffsets.keys.toSet == currentOffsets.keys.toSet &&
--- End diff --

`LongOffset` and `CompositeOffset` don't implement `equals` and `hashCode`, 
`StreamProgress`'s `equals` and `hashCode` doesn't work.

In addition, why not just call `currentOffsets.map(_._1.toString) == s. 
currentOffsets.map(_._1.toString)` for `equals`, and 
``currentOffsets.map(_._1.toString).hashCode` for `hashCode`? `mutable.HashMap` 
should already do that correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-178188600
  
**[Test build #2486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2486/consoleFull)**
 for PR 10989 at commit 
[`c1c0588`](https://github.com/apache/spark/commit/c1c0588053af5aa359b6d03bac6c5d0b198c5b69).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8872#discussion_r51481128
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala
 ---
@@ -0,0 +1,139 @@
+/*
--- End diff --

what changed here? Did you move this file or something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51482277
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -180,6 +221,46 @@ case class Join(
 }
   }
 
+  private def constructIsNotNullConstraints(condition: Expression): 
Set[Expression] = {
+// Currently we only propagate constraints if the condition consists 
of equality
+// and ranges. For all other cases, we return an empty set of 
constraints
+splitConjunctivePredicates(condition).map {
+  case EqualTo(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case GreaterThan(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case GreaterThanOrEqual(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case LessThan(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case LessThanOrEqual(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case _ =>
+Set.empty[Expression]
+}.foldLeft(Set.empty[Expression])(_ union _.toSet)
+  }
+
+  override protected def validConstraints: Set[Expression] = {
+joinType match {
+  case Inner if condition.isDefined =>
+left.constraints
+  .union(right.constraints)
+  .union(constructIsNotNullConstraints(condition.get))
--- End diff --

We should also be including the split form of the condition here and below, 
right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51483011
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala ---
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.lang.Thread.UncaughtExceptionHandler
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.util.Random
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.catalyst.encoders.{encoderFor, 
ExpressionEncoder, RowEncoder}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.streaming._
+
+/**
+ * A framework for implementing tests for streaming queries and sources.
+ *
+ * A test consists of a set of steps (expressed as a `StreamAction`) that 
are executed in order,
+ * blocking as necessary to let the stream catch up.  For example, the 
following adds some data to
+ * a stream, blocking until it can verify that the correct values are 
eventually produced.
+ *
+ * {{{
+ *  val inputData = MemoryStream[Int]
+val mapped = inputData.toDS().map(_ + 1)
+
+testStream(mapped)(
+  AddData(inputData, 1, 2, 3),
+  CheckAnswer(2, 3, 4))
+ * }}}
+ *
+ * Note that while we do sleep to allow the other thread to progress 
without spinning,
+ * `StreamAction` checks should not depend on the amount of time spent 
sleeping.  Instead they
+ * should check the actual progress of the stream before verifying the 
required test condition.
+ *
+ * Currently it is assumed that all streaming queries will eventually 
complete in 10 seconds to
+ * avoid hanging forever in the case of failures. However, individual 
suites can change this
+ * by overriding `streamingTimeout`.
+ */
+trait StreamTest extends QueryTest with Timeouts {
+
+  implicit class RichSource(s: Source) {
+def toDF(): DataFrame = new DataFrame(sqlContext, StreamingRelation(s))
+  }
+
+  /** How long to wait for an active stream to catch up when checking a 
result. */
+  val streamingTimout = 10.seconds
+
+  /** A trait for actions that can be performed while testing a streaming 
DataFrame. */
+  trait StreamAction
+
+  /** A trait to mark actions that require the stream to be actively 
running. */
+  trait StreamMustBeRunning
+
+  /**
+   * Adds the given data to the stream.  Subsuquent check answers will 
block until this data has
+   * been processed.
+   */
+  object AddData {
+def apply[A](source: MemoryStream[A], data: A*): AddDataMemory[A] =
+  AddDataMemory(source, data)
+  }
+
+  /** A trait that can be extended when testing other sources. */
+  trait AddData extends StreamAction {
+def source: Source
+
+/**
+ * Called to trigger adding the data.  Should return the offset that 
will denote when this
+ * new data has been processed.
+ */
+def addData(): Offset
+  }
+
+  case class AddDataMemory[A](source: MemoryStream[A], data: Seq[A]) 
extends AddData {
+override def toString: String = s"AddData to $source: 
${data.mkString(",")}"
+
+override def addData(): Offset = {
+  source.addData(data)
+}
+  }
+
+  /**
+   * Checks to make sure that the current data stored in the sink matches 
the `expectedAnswer`.
+   * This operation automatically blocks untill all added data has been 
processed.
+   */
+  object CheckAnswer {
+def apply[A : Encoder](data: A*): CheckAnswerRows = {
+  val encoder = encoderFor[A]
+  val toExternalRow = RowEncoder(encoder.schema)
+  CheckAnswerRows(data.map(d => 
toExternalRow.fromRow(encoder.toRow(d
+}
+
+def apply(rows: Row*):

[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

2016-02-01 Thread huaxingao

GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/11005

[SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBCRelation

As suggested https://issues.apache.org/jira/browse/SPARK-9182?focusedCommentId=15031526page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15031526;
 class="external-link" rel="nofollow">here, I will change JDBCRelation to 
implement CatalystScan, and then directly access Catalyst expressions in 
JDBCRDD.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-12126

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11005.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11005


commit 0bdfea86fc0671bf636bbed3174baa417de6367e
Author: Huaxin Gao 
Date:   2016-01-31T04:01:12Z

[SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBCRelation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10934#issuecomment-178132065
  
I did not merge this into 1.6 or before because of 2 reasons:
- It doesn't merge cleanly, and more importantly
- This changes internal semantics and it's not technically a bug
Let me know if you disagree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-01 Thread iyounus

Github user iyounus commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-178134675
  
For the case (3), I'm assuming that the label and features are not 
standardized. So, in that case, the solution exists. Here is my perspective on 
this.

The normal equation `X^T X \beta = X^T y` has a unique solution if matrix 
`X` is full rank. If its not full rank, the `X^T X` becomes singular and hence 
not invertible. With regularization, the equation  `(X^T X + \lambda I) \beta = 
X^T y` still has unique solution if the matrix `(X^T X + \lambda I)` is 
invertible. This is true even if elements of `y` are constant. For the sample 
data I'm using above, I can solve this equation by hand and obtain `\beta` 
(with and without intercept). If I'm using any software, it must reproduce 
these results. Note that standardization is completely independent operation. 
Its not part of normal equation. So, if the user demands no standardization, 
then the any software should produce the analytical solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12989] [SQL] Delaying Alias Cleanup aft...

2016-02-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10963#issuecomment-178139377
  
Thanks, merging to master and 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178146931
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50497/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12330] [MESOS] Fix mesos coarse mode cl...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10319#discussion_r51470974
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -364,7 +379,27 @@ private[spark] class CoarseMesosSchedulerBackend(
   }
 
   override def stop() {
-super.stop()
+// Make sure we're not launching tasks during shutdown
+stateLock.synchronized {
+  if (stopCalled) {
+logWarning("Stop called multiple times, ignoring")
+return
+  }
+  stopCalled = true
+  super.stop()
+}
+// Wait for finish
+val stopwatch = new Stopwatch()
+stopwatch.start()
+// slaveIdsWithExecutors has no memory barrier, so this is eventually 
consistent
+while (slaveIdsWithExecutors.nonEmpty &&
+  stopwatch.elapsed(TimeUnit.MILLISECONDS) < shutdownTimeoutMS) {
+  Thread.sleep(100)
+}
+if(slaveIdsWithExecutors.nonEmpty) {
--- End diff --

style:
```
if (slaveIds...) {
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12330] [MESOS] Fix mesos coarse mode cl...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10319#discussion_r51470918
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -364,7 +379,27 @@ private[spark] class CoarseMesosSchedulerBackend(
   }
 
   override def stop() {
-super.stop()
+// Make sure we're not launching tasks during shutdown
+stateLock.synchronized {
+  if (stopCalled) {
+logWarning("Stop called multiple times, ignoring")
+return
+  }
+  stopCalled = true
+  super.stop()
+}
+// Wait for finish
--- End diff --

can you expand on this comment? From the code my understanding is that we 
need to wait until all slaves have properly shutdown before we terminate the 
Mesos task running the driver. It would be good if this comment could provide 
more context on why we're doing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10860#discussion_r51472337
  
--- Diff: 
core/src/test/resources/spark-events/local-1425081759268/EVENT_LOG_1 ---
@@ -0,0 +1,88 @@
+{"Event":"SparkListenerBlockManagerAdded","Block Manager ID":{"Executor 
ID":"","Host":"localhost","Port":57967},"Maximum 
Memory":278302556,"Timestamp":1425081759407}
--- End diff --

I might be missing something, why do we still have test files using the old 
log format? Shouldn't we just replace all old test logs using the new format?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-02-01 Thread felixcheung

Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-178166829
  
sure - that was just to make sure it doesn't get picked up, as a negative 
test. I can purge all of that since it seems both you and @vanzin have the same 
feedback?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10998#issuecomment-178166922
  
**[Test build #50494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50494/consoleFull)**
 for PR 10998 at commit 
[`02f2a25`](https://github.com/apache/spark/commit/02f2a25346edd718d9371955759478408e77382b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51476814
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.lang.Thread.UncaughtExceptionHandler
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.{ContinuousQuery, DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap}
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.QueryExecution
+
+/**
+ * Manages the execution of a streaming Spark SQL query that is occurring 
in a separate thread.
+ * Unlike a standard query, a streaming query executes repeatedly each 
time new data arrives at any
+ * [[Source]] present in the query plan. Whenever new data arrives, a 
[[QueryExecution]] is created
+ * and the results are committed transactionally to the given [[Sink]].
+ */
+class StreamExecution(
+sqlContext: SQLContext,
+private[sql] val logicalPlan: LogicalPlan,
+val sink: Sink) extends ContinuousQuery with Logging {
+
+  /** An monitor used to wait/notify when batches complete. */
+  val awaitBatchLock = new Object
+
+  @volatile
+  var batchRun = false
--- End diff --

private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51476822
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.lang.Thread.UncaughtExceptionHandler
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.{ContinuousQuery, DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap}
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.QueryExecution
+
+/**
+ * Manages the execution of a streaming Spark SQL query that is occurring 
in a separate thread.
+ * Unlike a standard query, a streaming query executes repeatedly each 
time new data arrives at any
+ * [[Source]] present in the query plan. Whenever new data arrives, a 
[[QueryExecution]] is created
+ * and the results are committed transactionally to the given [[Sink]].
+ */
+class StreamExecution(
+sqlContext: SQLContext,
+private[sql] val logicalPlan: LogicalPlan,
+val sink: Sink) extends ContinuousQuery with Logging {
+
+  /** An monitor used to wait/notify when batches complete. */
+  val awaitBatchLock = new Object
+
+  @volatile
+  var batchRun = false
+
+  /** Minimum amount of time in between the start of each batch. */
+  val minBatchTime = 10
--- End diff --

private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51476807
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.lang.Thread.UncaughtExceptionHandler
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.{ContinuousQuery, DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap}
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.QueryExecution
+
+/**
+ * Manages the execution of a streaming Spark SQL query that is occurring 
in a separate thread.
+ * Unlike a standard query, a streaming query executes repeatedly each 
time new data arrives at any
+ * [[Source]] present in the query plan. Whenever new data arrives, a 
[[QueryExecution]] is created
+ * and the results are committed transactionally to the given [[Sink]].
+ */
+class StreamExecution(
+sqlContext: SQLContext,
+private[sql] val logicalPlan: LogicalPlan,
+val sink: Sink) extends ContinuousQuery with Logging {
+
+  /** An monitor used to wait/notify when batches complete. */
+  val awaitBatchLock = new Object
--- End diff --

private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51480355
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamProgress.scala
 ---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import scala.collection.mutable
+
+/**
+ * A helper class that looks like a Map[Source, Offset].
+ */
+class StreamProgress extends Serializable {
+  private val currentOffsets = new mutable.HashMap[Source, Offset]
--- End diff --

@transient since Source is not Serializable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-178189170
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50500/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-178189155
  
**[Test build #50500 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50500/consoleFull)**
 for PR 10949 at commit 
[`7c6650d`](https://github.com/apache/spark/commit/7c6650dceed4eaf9e9f50542c76c88fccf7ac4a2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-02-01 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10995#issuecomment-178197729
  
Since Hadoop doesn't shade `protobuf`, I think this won't fix the issue in 
the PR description. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51482050
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -89,9 +89,27 @@ case class Generate(
 
 case class Filter(condition: Expression, child: LogicalPlan) extends 
UnaryNode {
   override def output: Seq[Attribute] = child.output
+
+  override protected def validConstraints: Set[Expression] = {
+val newConstraint = splitConjunctivePredicates(condition)
+  .filter(_.references.subsetOf(outputSet))
--- End diff --

not needed right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51482081
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -180,6 +221,46 @@ case class Join(
 }
   }
 
+  private def constructIsNotNullConstraints(condition: Expression): 
Set[Expression] = {
--- End diff --

Why is this done as a special case here, instead of doing it as part of 
`getRelevantConstraints`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12265][Mesos] Spark calls System.exit i...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10921#issuecomment-178204415
  
LGTM merging into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10998#issuecomment-178128143
  
**[Test build #50494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50494/consoleFull)**
 for PR 10998 at commit 
[`02f2a25`](https://github.com/apache/spark/commit/02f2a25346edd718d9371955759478408e77382b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

2016-02-01 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/10838#issuecomment-178141964
  
@zsxwing Other than the PR title - I think everything else says inflight in 
code/documentation. Did you see anything to the contrary ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178143411
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50496/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178143406
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095] [Mesos] Support launching multipl...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10993#issuecomment-178151143
  
Looks like this is failing real tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12705] [SPARK-10777] [SQL] Analyzer Rul...

2016-02-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10678#issuecomment-178161088
  
LGTM, thanks for adding a bunch of tests!

Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12330] [MESOS] Fix mesos coarse mode cl...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10319#discussion_r51471308
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -364,7 +379,27 @@ private[spark] class CoarseMesosSchedulerBackend(
   }
 
   override def stop() {
-super.stop()
+// Make sure we're not launching tasks during shutdown
+stateLock.synchronized {
+  if (stopCalled) {
+logWarning("Stop called multiple times, ignoring")
+return
+  }
+  stopCalled = true
+  super.stop()
+}
+// Wait for finish
+val stopwatch = new Stopwatch()
+stopwatch.start()
+// slaveIdsWithExecutors has no memory barrier, so this is eventually 
consistent
+while (slaveIdsWithExecutors.nonEmpty &&
+  stopwatch.elapsed(TimeUnit.MILLISECONDS) < shutdownTimeoutMS) {
+  Thread.sleep(100)
+}
+if(slaveIdsWithExecutors.nonEmpty) {
+  logWarning(s"${slaveIdsWithExecutors.size} executors still running. "
++ "Proceeding with mesos driver stop.")
--- End diff --

I don't understand this warning message. I think you mean something more 
like
```
Timed out on waiting for executors to terminate ($X still running) after 
$timeout ms.
Proceeding to stop Mesos driver, which may lead to leftover temporary files 
on the slaves.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-02-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10915#issuecomment-178167383
  
Thanks, merged to branch-1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10998#issuecomment-178167285
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50494/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10998#issuecomment-178167283
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

2016-02-01 Thread nongli

Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/10998#issuecomment-178169324
  
Can you include the generated code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-178169396
  
I see, that seems unnecessary. We deleted all the code path that would 
parse the format so it's pretty much impossible for it to be picked up right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12299][CORE][WIP] Remove history servin...

2016-02-01 Thread BryanCutler

Github user BryanCutler commented on the pull request:

https://github.com/apache/spark/pull/10991#issuecomment-178169443
  
cc @andrewor14, @squito for thoughts on above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10605][SQL] Add structs to collect_list...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11004#issuecomment-178169641
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10605][SQL] Add structs to collect_list...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11004#issuecomment-178169642
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50493/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML][MINOR] Invalid MulticlassClassification r...

2016-02-01 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/10996#issuecomment-178169426
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095] [Mesos] Support launching multipl...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10993#issuecomment-178173549
  
Also, follow-up question: have you tested this with cluster mode and/or 
with dynamic allocation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-178178205
  
By the way, I'd just like to point out that there is another patch that 
fixes the same issue #10768. @tnachen @dragos what's the difference and which 
one should we proceed with?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][CORE] Fix dispatcher does not ha...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10768#issuecomment-178178342
  
FYI #10949 is another patch for the same issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12265][Mesos] Spark calls System.exit i...

2016-02-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10921


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12992][SQL]: Update parquet reader to s...

2016-02-01 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10908#discussion_r51462281
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
 ---
@@ -180,6 +188,152 @@ public void readIntegers(int total, ColumnVector c, 
int rowId, int level,
 }
   }
 
+  // TODO: can this code duplication be removed without a perf penalty?
+  public void readBytes(int total, ColumnVector c,
+int rowId, int level, VectorizedValuesReader data) 
{
+int left = total;
+while (left > 0) {
+  if (this.currentCount == 0) this.readNextGroup();
+  int n = Math.min(left, this.currentCount);
+  switch (mode) {
+case RLE:
+  if (currentValue == level) {
+data.readBytes(n, c, rowId);
+c.putNotNulls(rowId, n);
+  } else {
+c.putNulls(rowId, n);
+  }
+  break;
+case PACKED:
+  for (int i = 0; i < n; ++i) {
+if (currentBuffer[currentBufferIdx++] == level) {
+  c.putByte(rowId + i, data.readByte());
+  c.putNotNull(rowId + i);
+} else {
+  c.putNull(rowId + i);
+}
+  }
+  break;
+  }
+  rowId += n;
+  left -= n;
+  currentCount -= n;
+}
+  }
+
+  public void readLongs(int total, ColumnVector c, int rowId, int level,
+VectorizedValuesReader data) {
+int left = total;
+while (left > 0) {
+  if (this.currentCount == 0) this.readNextGroup();
+  int n = Math.min(left, this.currentCount);
+  switch (mode) {
+case RLE:
+  if (currentValue == level) {
+data.readLongs(n, c, rowId);
+c.putNotNulls(rowId, n);
+  } else {
+c.putNulls(rowId, n);
+  }
+  break;
+case PACKED:
+  for (int i = 0; i < n; ++i) {
+if (currentBuffer[currentBufferIdx++] == level) {
+  c.putLong(rowId + i, data.readLong());
+  c.putNotNull(rowId + i);
+} else {
+  c.putNull(rowId + i);
+}
+  }
+  break;
+  }
+  rowId += n;
+  left -= n;
+  currentCount -= n;
+}
+  }
+
+  public void readBinarys(int total, ColumnVector c, int rowId, int level,
+VectorizedValuesReader data) {
+int left = total;
+while (left > 0) {
+  if (this.currentCount == 0) this.readNextGroup();
+  int n = Math.min(left, this.currentCount);
+  switch (mode) {
+case RLE:
+  if (currentValue == level) {
+c.putNotNulls(rowId, n);
+data.readBinary(n, c, rowId);
+  } else {
+c.putNulls(rowId, n);
+  }
+  break;
+case PACKED:
+  for (int i = 0; i < n; ++i) {
+if (currentBuffer[currentBufferIdx++] == level) {
+  c.putNotNull(rowId + i);
+  data.readBinary(1, c, rowId);
+} else {
+  c.putNull(rowId + i);
+}
+  }
+  break;
+  }
+  rowId += n;
+  left -= n;
+  currentCount -= n;
+}
+  }
+
+
+  // This is used for decoding dictionary IDs (as opposed to definition 
levels).
+  @Override
+  public void readIntegers(int total, ColumnVector c, int rowId) {
+int left = total;
+while (left > 0) {
+if (this.currentCount == 0) this.readNextGroup();
+  int n = Math.min(left, this.currentCount);
+  switch (mode) {
+case RLE:
+  c.putInts(rowId, n, currentValue);
+  break;
+case PACKED:
+  c.putInts(rowId, n, currentBuffer, currentBufferIdx);
+  currentBufferIdx += n;
+  break;
+  }
+  rowId += n;
+  left -= n;
+  currentCount -= n;
+}
+  }
+
+  @Override
+  public byte readByte() {
+throw new UnsupportedOperationException("only readInts is valid.");
--- End diff --

This should be readInts. The only valid read* APIs that doesn't also decode 
definition levels is used to decode dictionary ids, which are always ints. I 
updated the comment for readIntegers() to try to capture this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but

[GitHub] spark pull request: [SPARK-12992][SQL]: Update parquet reader to s...

2016-02-01 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10908#discussion_r51462376
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
@@ -52,15 +55,49 @@ public void skip(int n) {
   }
 
   @Override
-  public void readIntegers(int total, ColumnVector c, int rowId) {
+  public final void readIntegers(int total, ColumnVector c, int rowId) {
 c.putIntsLittleEndian(rowId, total, buffer, offset - 
Platform.BYTE_ARRAY_OFFSET);
 offset += 4 * total;
   }
 
   @Override
-  public int readInteger() {
+  public final void readLongs(int total, ColumnVector c, int rowId) {
+c.putLongsLittleEndian(rowId, total, buffer, offset - 
Platform.BYTE_ARRAY_OFFSET);
+offset += 8 * total;
+  }
+
+  @Override
+  public final void readBytes(int total, ColumnVector c, int rowId) {
+c.putBytes(rowId, total, buffer, offset - Platform.BYTE_ARRAY_OFFSET);
+offset += total;
+  }
+
+  @Override
+  public final int readInteger() {
 int v = Platform.getInt(buffer, offset);
 offset += 4;
 return v;
   }
+
+  @Override
+  public final long readLong() {
+long v = Platform.getLong(buffer, offset);
+offset += 8;
+return v;
+  }
+
+  @Override
+  public final byte readByte() {
+return (byte)readInteger();
--- End diff --

4. Parquet only has 2 integers physical types: int32 and int64 and relies 
on encodings to compress bytes/shorts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

2016-02-01 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10838#issuecomment-178129985
  
@tgravescs in general, this patch doesn't limit number of concurrent 
outbound connections. Instead, it just limits number of in flight blocks. Could 
you update the title and comments/variable names/configuration in this PR to 
make it accurate? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178144361
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12478][SQL] Bugfix: Dataset fields of p...

2016-02-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10431#issuecomment-178158109
  
Yeah, since its labeled experimental, I'd like to fix as many Dataset bugs 
as we can for 1.6.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-02-01 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10989#discussion_r51477107
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -54,13 +54,27 @@ public void setInput(Iterator iter) {
   }
 
   /**
+   * Returns whether `processNext()` should stop processing next row from 
`input` or not.
+   */
+  protected boolean shouldStop() {
--- End diff --

@rxin this seems like it could be used to support limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-178189166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178191589
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50498/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8872#discussion_r51481006
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -525,14 +531,14 @@ private[spark] class MesosClusterScheduler(
 d.retryState.get.nextRetry.before(currentTime)
   }
 
-  scheduleTasks(
+  currentOffers = scheduleTasks(
--- End diff --

similarly I don't get the point of assigning this var every time you call 
`scheduleTasks`. It's the same thing, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178191582
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51482488
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
 ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical._
+
+class ConstraintPropagationSuite extends SparkFunSuite {
+
+  private def resolveColumn(tr: LocalRelation, columnName: String): 
Expression =
+tr.analyze.resolveQuoted(columnName, caseInsensitiveResolution).get
+
+  private def verifyConstraints(a: Set[Expression], b: Set[Expression]): 
Unit = {
+assert(a.forall(i => b.map(_.semanticEquals(i)).reduce(_ || _)))
+assert(b.forall(i => a.map(_.semanticEquals(i)).reduce(_ || _)))
--- End diff --

I would make this function manually call `fail` with the condition that we 
can't find, and also differentiate between `missing` and `found but not 
expected`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10934#discussion_r51463360
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -821,6 +821,75 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 checkpointWriter.stop()
   }
 
+  test("SPARK-6847: stack overflow when updateStateByKey is followed by a 
checkpointed dstream") {
+// In this test, there are two updateStateByKey operators. The RDD DAG 
is as follows:
+//
+// batch 1batch 2batch 3 ...
+//
+// 1) input rdd  input rdd  input rdd
+//   |  |  |
+//   v  v  v
+// 2) cogroup rdd   ---> cogroup rdd   ---> cogroup rdd  ...
+//   | /| /|
+//   v/ v/ v
+// 3)  map rdd ---map rdd ---map rdd ...
+//   |  |  |
+//   v  v  v
+// 4) cogroup rdd   ---> cogroup rdd   ---> cogroup rdd  ...
+//   | /| /|
+//   v/ v/ v
+// 5)  map rdd ---map rdd ---map rdd ...
+//
+// Every batch depends on its previous batch, so "updateStateByKey" 
needs to do checkpoint to
+// break the RDD chain. However, before SPARK-6847, when the state RDD 
(layer 5) of the second
+// "updateStateByKey" does checkpoint, it won't checkpoint the state 
RDD (layer 3) of the first
+// "updateStateByKey" (Note: "updateStateByKey" has already marked 
that its state RDD (layer 3)
+// should be checkpointed). Hence, the connections between layer 2 and 
layer 3 won't be broken
+// and the RDD chain will grow infinitely and cause StackOverflow.
+//
+// Therefore SPARK-6847 introduces 
"spark.checkpoint.checkpointAllMarked" to force checkpointing
+// all marked RDDs in the DAG to resolve this issue. (For the previous 
example, it will break
+// connections between layer 2 and layer 3)
+ssc = new StreamingContext(master, framework, batchDuration)
+val batchCounter = new BatchCounter(ssc)
+ssc.checkpoint(checkpointDir)
+val inputDStream = new CheckpointInputDStream(ssc)
+val updateFunc = (values: Seq[Int], state: Option[Int]) => {
+  Some(values.sum + state.getOrElse(0))
+}
+@volatile var shouldCheckpointAllMarkedRDDs = false
+@volatile var rddsCheckpointed = false
+inputDStream.map(i => (i, i))
+  .updateStateByKey(updateFunc).checkpoint(batchDuration)
+  .updateStateByKey(updateFunc).checkpoint(batchDuration)
+  .foreachRDD { rdd =>
+/**
+ * Find all RDDs that are marked for checkpointing in the 
specified RDD and its ancestors.
+ */
+def findAllMarkedRDDs(rdd: RDD[_]): List[RDD[_]] = {
--- End diff --

oh I see... that's unfortunate


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6847][Core][Streaming]Fix stack overflo...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10934#issuecomment-178130645
  
Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-178133351
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50495/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12989] [SQL] Delaying Alias Cleanup aft...

2016-02-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10963


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178151805
  
**[Test build #50498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50498/consoleFull)**
 for PR 11006 at commit 
[`7147b3f`](https://github.com/apache/spark/commit/7147b3f8b25e73102a8ba98d81c175acd74f49ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] Fix the jar location of datanucleus in ...

2016-02-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10901#issuecomment-178163321
  
Merging to master and 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-02-01 Thread felixcheung

Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/10860#issuecomment-178159771
  
Could we merge this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Docs] Fix the jar location of datanucleus in ...

2016-02-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10901


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-02-01 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10915#issuecomment-178172474
  
Can you close this PR now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-178177036
  
Also cc @tnachen who wrote this code originally. From the JIRA:
> CoarseMesosSchedulerBackend have constraints feature but dispacher deploy 
use MesosClusterScheduler, it is different method.
Is this caused by duplicate code somewhere? Can we resolve that, either in 
this patch or separately?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8872#discussion_r51480776
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -497,28 +505,28 @@ private[spark] class MesosClusterScheduler(
 container, image, volumes = volumes, portmaps = portmaps)
   taskInfo.setContainer(container.build())
 }
-val queuedTasks = tasks.getOrElseUpdate(offer.offer.getId, new 
ArrayBuffer[TaskInfo])
+val queuedTasks = tasks.getOrElseUpdate(offer.offerId, new 
ArrayBuffer[TaskInfo])
 queuedTasks += taskInfo.build()
-logTrace(s"Using offer ${offer.offer.getId.getValue} to launch 
driver " +
+logTrace(s"Using offer ${offer.offerId.getValue} to launch driver 
" +
   submission.submissionId)
-val newState = new MesosClusterSubmissionState(submission, taskId, 
offer.offer.getSlaveId,
+val newState = new MesosClusterSubmissionState(submission, taskId, 
offer.slaveId,
   None, new Date(), None)
 launchedDrivers(submission.submissionId) = newState
 launchedDriversState.persist(submission.submissionId, newState)
 afterLaunchCallback(submission.submissionId)
   }
 }
+currentOffers
--- End diff --

@tnachen I also don't get it. If the caller called us with `currentOffers`, 
they already have access to it, so what's the point in returning them here? 
AFAIK we don't mutate this list in this method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: java mapwithstate, broken java mapping

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11007#issuecomment-178190454
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11005#issuecomment-178126949
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

2016-02-01 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/6935#issuecomment-178142661
  
hi @steveloughran sorry for another really long delay on my end.  mostly 
this looks fine, there are some style nits and a couple of comments that need 
updating.  I also looked into the query param thing -- I played with it a bit 
more and realized its kind of nuisance to test, but I did write a test on 
`ApplicationCache` for it which I'll send to you.

I just had one more concern with the last read-through -- what happens when 
an app goes from incomplete to complete?  I did some manual testing, and things 
seem to work.  But, I have a fear that there is some lingering state that isn't 
getting cleaned up.  I will try to walk through things more carefully but maybe 
you understand it well enough that you can reassure me (or perhaps you should 
take another look yourself as well ...)

and of course, there are now merge conflicts which need to be fixed, sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178146926
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

2016-02-01 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10838#issuecomment-178153656
  
Hm, I think `block` is also not `accurate`. Never mind, just update the 
title should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13113][Core] Remove unnecessary bit ope...

2016-02-01 Thread holdenk

Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/11002#issuecomment-178160294
  
This change looks good to me, maybe @JoshRosen who was the last to touch 
that line can take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12992][SQL]: Update parquet reader to s...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10908#issuecomment-178168455
  
**[Test build #50499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50499/consoleFull)**
 for PR 10908 at commit 
[`2ea2d54`](https://github.com/apache/spark/commit/2ea2d547c908aa4be289959c461b06fbbc0f9f63).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML][MINOR] Invalid MulticlassClassification r...

2016-02-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10996


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-178174598
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12992][SQL]: Update parquet reader to s...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10908#issuecomment-178179904
  
**[Test build #50499 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50499/consoleFull)**
 for PR 10908 at commit 
[`2ea2d54`](https://github.com/apache/spark/commit/2ea2d547c908aa4be289959c461b06fbbc0f9f63).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8872#discussion_r51480535
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -456,34 +460,36 @@ private[spark] class MesosClusterScheduler(
   private def scheduleTasks(
   candidates: Seq[MesosDriverDescription],
   afterLaunchCallback: (String) => Boolean,
-  currentOffers: List[ResourceOffer],
-  tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): Unit = {
+  currentOffers: JList[ResourceOffer],
+  tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): 
JList[ResourceOffer] = {
--- End diff --

what does this return? You need to document this in the javadoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: java mapwithstate, broken java mapping

2016-02-01 Thread gabrielenizzoli

GitHub user gabrielenizzoli opened a pull request:

https://github.com/apache/spark/pull/11007

java mapwithstate, broken java mapping

java mapwithstate with Function3 has wrong conversion of java Optional to 
scala Option, now code uses same conversion used in the mapwithstate call that 
uses Function4 as an input

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gabrielenizzoli/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11007.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11007


commit 78c0323f90d1f31796732229aad84ab1f5f74332
Author: Gabriele Nizzoli 
Date:   2016-02-01T21:03:57Z

java mapwithstate broken java mapping

java mapwithstate with Function3 has wrong conversion of java Optional to 
scala Option




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12798] [SQL] generated BroadcastHashJoi...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10989#issuecomment-178147379
  
**[Test build #2486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2486/consoleFull)**
 for PR 10989 at commit 
[`c1c0588`](https://github.com/apache/spark/commit/c1c0588053af5aa359b6d03bac6c5d0b198c5b69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12790][CORE] Remove HistoryServer old m...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10860#discussion_r51472679
  
--- Diff: .rat-excludes ---
@@ -63,14 +63,15 @@ logs
 .*dependency-reduced-pom.xml
 known_translations
 json_expectation
-local-1422981759269/*
-local-1422981780767/*
-local-1425081759269/*
-local-1426533911241/*
-local-1426633911242/*
-local-1430917381534/*
+local-1422981759269
+local-1422981780767
+local-1425081759269
+local-1426533911241
+local-1426633911242
+local-1430917381534
 local-1430917381535_1
 local-1430917381535_2
+local-1425081759268/*
--- End diff --

what does this mean? Shouldn't we just remove this line and delete the old 
logs that we're not even using anymore?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51476990
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.lang.Thread.UncaughtExceptionHandler
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.Logging
+import org.apache.spark.sql.{ContinuousQuery, DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap}
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.QueryExecution
+
+/**
+ * Manages the execution of a streaming Spark SQL query that is occurring 
in a separate thread.
+ * Unlike a standard query, a streaming query executes repeatedly each 
time new data arrives at any
+ * [[Source]] present in the query plan. Whenever new data arrives, a 
[[QueryExecution]] is created
+ * and the results are committed transactionally to the given [[Sink]].
+ */
+class StreamExecution(
+sqlContext: SQLContext,
+private[sql] val logicalPlan: LogicalPlan,
+val sink: Sink) extends ContinuousQuery with Logging {
+
+  /** An monitor used to wait/notify when batches complete. */
+  val awaitBatchLock = new Object
+
+  @volatile
+  var batchRun = false
+
+  /** Minimum amount of time in between the start of each batch. */
+  val minBatchTime = 10
+
+  /** Tracks how much data we have processed from each input source. */
+  private[sql] val currentOffsets = new StreamProgress
+
+  /** All stream sources present the query plan. */
+  private val sources =
+logicalPlan.collect { case s: StreamingRelation => s.source }
+
+  // Start the execution at the current offsets stored in the sink. (i.e. 
avoid reprocessing data
+  // that we have already processed).
+  {
+sink.currentOffset match {
+  case Some(c: CompositeOffset) =>
+val storedProgress = c.offsets
+val sources = logicalPlan collect {
+  case StreamingRelation(source, _) => source
+}
+
+assert(sources.size == storedProgress.size)
+sources.zip(storedProgress).foreach { case (source, offset) =>
+  offset.foreach(currentOffsets.update(source, _))
+}
+  case None => // We are starting this stream for the first time.
+  case _ => throw new IllegalArgumentException("Expected composite 
offset from sink")
+}
+  }
+
+  logInfo(s"Stream running at $currentOffsets")
+
+  /** When false, signals to the microBatchThread that it should stop 
running. */
+  @volatile private var shouldRun = true
+
+  // TODO: add exception handling to batch thread
+  /** The thread that runs the micro-batches of this stream. */
+  private[sql] val microBatchThread = new Thread("stream execution 
thread") {
+override def run(): Unit = {
+  SQLContext.setActive(sqlContext)
+  while (shouldRun) {
+attemptBatch()
+Thread.sleep(minBatchTime) // TODO: Could be tighter
+  }
+}
+  }
+  microBatchThread.setDaemon(true)
+  microBatchThread.start()
+
+  @volatile
+  private[sql] var lastExecution: QueryExecution = null
+  @volatile
+  private[sql] var streamDeathCause: Throwable = null
+
+  microBatchThread.setUncaughtExceptionHandler(
--- End diff --

Could you move this statement above `microBatchThread.start()`? Otherwise, 
we may miss some exception happening between `microBatchThread.start()` and 
`microBatchThread.setUncaughtExceptionHandler`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

2016-02-01 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51478445
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -180,6 +221,46 @@ case class Join(
 }
   }
 
+  private def constructIsNotNullConstraints(condition: Expression): 
Set[Expression] = {
+// Currently we only propagate constraints if the condition consists 
of equality
+// and ranges. For all other cases, we return an empty set of 
constraints
+splitConjunctivePredicates(condition).map {
+  case EqualTo(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case GreaterThan(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case GreaterThanOrEqual(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case LessThan(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case LessThanOrEqual(l, r) =>
+Set(IsNotNull(l), IsNotNull(r))
+  case _ =>
+Set.empty[Expression]
+}.foldLeft(Set.empty[Expression])(_ union _.toSet)
+  }
+
+  override protected def validConstraints: Set[Expression] = {
+joinType match {
+  case Inner if condition.isDefined =>
+left.constraints
+  .union(right.constraints)
+  .union(constructIsNotNullConstraints(condition.get))
+  case LeftSemi if condition.isDefined =>
+left.constraints
+  .union(right.constraints)
+  .union(constructIsNotNullConstraints(condition.get))
+  case LeftOuter =>
+left.constraints
+  case RightOuter =>
+right.constraints
+  case FullOuter =>
+Set.empty[Expression]
+}
+  }
+
+  def selfJoinResolved: Boolean = 
left.outputSet.intersect(right.outputSet).isEmpty
--- End diff --

I think this was a merging mistake as its duplicated with the method below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12832][MESOS] mesos scheduler respect a...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10949#issuecomment-178181958
  
**[Test build #50500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50500/consoleFull)**
 for PR 10949 at commit 
[`7c6650d`](https://github.com/apache/spark/commit/7c6650dceed4eaf9e9f50542c76c88fccf7ac4a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51478334
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala 
---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{Logging, SparkEnv}
+import org.apache.spark.sql.{DataFrame, Dataset, Encoder, Row, SQLContext}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.encoders.{encoderFor, RowEncoder}
+import org.apache.spark.sql.types.StructType
+
+object MemoryStream {
+  protected val currentBlockId = new AtomicInteger(0)
+  protected val memoryStreamId = new AtomicInteger(0)
+
+  def apply[A : Encoder](implicit sqlContext: SQLContext): MemoryStream[A] 
=
+new MemoryStream[A](memoryStreamId.getAndIncrement(), sqlContext)
+}
+
+/**
+ * A [[Source]] that produces value stored in memory as they are added by 
the user.  This [[Source]]
+ * is primarily intended for use in unit tests as it can only replay data 
when the object is still
+ * available.
+ */
+case class MemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
+extends Source with Logging {
+  protected val encoder = encoderFor[A]
+  protected val logicalPlan = StreamingRelation(this)
+  protected val output = logicalPlan.output
+  protected val batches = new ArrayBuffer[Dataset[A]]
+  protected var currentOffset: LongOffset = new LongOffset(-1)
+
+  protected def blockManager = SparkEnv.get.blockManager
+
+  def schema: StructType = encoder.schema
+
+  def getCurrentOffset: Offset = currentOffset
+
+  def toDS()(implicit sqlContext: SQLContext): Dataset[A] = {
+new Dataset(sqlContext, logicalPlan)
+  }
+
+  def toDF()(implicit sqlContext: SQLContext): DataFrame = {
+new DataFrame(sqlContext, logicalPlan)
+  }
+
+  def addData(data: TraversableOnce[A]): Offset = {
+import sqlContext.implicits._
+this.synchronized {
+  currentOffset = currentOffset + 1
+  val ds = data.toVector.toDS()
+  logDebug(s"Adding ds: $ds")
+  batches.append(ds)
+  currentOffset
+}
+  }
+
+  override def getNextBatch(start: Option[Offset]): Option[Batch] = 
synchronized {
+val newBlocks =
+  batches.drop(
+
start.map(_.asInstanceOf[LongOffset]).getOrElse(LongOffset(-1)).offset.toInt + 
1)
+
+if (newBlocks.nonEmpty) {
+  logDebug(s"Running [$start, $currentOffset] on blocks 
${newBlocks.mkString(", ")}")
+  val df = newBlocks
+  .map(_.toDF())
+  .reduceOption(_ unionAll _)
+  .getOrElse(sqlContext.emptyDataFrame)
+
+  Some(new Batch(currentOffset, df))
+} else {
+  None
+}
+  }
+
+  override def toString: String = s"MemoryStream[${output.mkString(",")}]"
+}
+
+/**
+ * A sink that stores the results in memory. This [[Sink]] is primarily 
intended for use in unit
+ * tests and does not provide durablility.
+ */
+class MemorySink(schema: StructType) extends Sink with Logging {
+  /** An order list of batches that have been written to this [[Sink]]. */
+  private var batches = new ArrayBuffer[Batch]()
+
+  /** Used to convert an [[InternalRow]] to an external [[Row]] for 
comparison in testing. */
+  private val externalRowConverter = RowEncoder(schema)
+
+  override def currentOffset: Option[Offset] = synchronized {
+batches.lastOption.map(_.end)
+  }
+
+  override def addBatch(nextBatch: Batch): Unit = synchronized {
+batches.append(nextBatch)
+  }
+
+  /** Returns all rows that are stored in this [[Sink]]. */
+  def allData: Seq[Row] = synchronized

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178190775
  
**[Test build #50498 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50498/consoleFull)**
 for PR 11006 at commit 
[`7147b3f`](https://github.com/apache/spark/commit/7147b3f8b25e73102a8ba98d81c175acd74f49ca).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2016-02-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8872#issuecomment-178195492
  
@tnachen echoing my comment from another PR: it seems that this feature is 
already supported in client mode but not in cluster mode. Is there something we 
can do about this divergence in behavior? Is there some duplicate code to clean 
up so we don't run into something like this in the future?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11006#discussion_r51483601
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala ---
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.lang.Thread.UncaughtExceptionHandler
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.util.Random
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.catalyst.encoders.{encoderFor, 
ExpressionEncoder, RowEncoder}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.streaming._
+
+/**
+ * A framework for implementing tests for streaming queries and sources.
+ *
+ * A test consists of a set of steps (expressed as a `StreamAction`) that 
are executed in order,
+ * blocking as necessary to let the stream catch up.  For example, the 
following adds some data to
+ * a stream, blocking until it can verify that the correct values are 
eventually produced.
+ *
+ * {{{
+ *  val inputData = MemoryStream[Int]
+val mapped = inputData.toDS().map(_ + 1)
+
+testStream(mapped)(
+  AddData(inputData, 1, 2, 3),
+  CheckAnswer(2, 3, 4))
+ * }}}
+ *
+ * Note that while we do sleep to allow the other thread to progress 
without spinning,
+ * `StreamAction` checks should not depend on the amount of time spent 
sleeping.  Instead they
+ * should check the actual progress of the stream before verifying the 
required test condition.
+ *
+ * Currently it is assumed that all streaming queries will eventually 
complete in 10 seconds to
+ * avoid hanging forever in the case of failures. However, individual 
suites can change this
+ * by overriding `streamingTimeout`.
+ */
+trait StreamTest extends QueryTest with Timeouts {
+
+  implicit class RichSource(s: Source) {
+def toDF(): DataFrame = new DataFrame(sqlContext, StreamingRelation(s))
+  }
+
+  /** How long to wait for an active stream to catch up when checking a 
result. */
+  val streamingTimout = 10.seconds
+
+  /** A trait for actions that can be performed while testing a streaming 
DataFrame. */
+  trait StreamAction
+
+  /** A trait to mark actions that require the stream to be actively 
running. */
+  trait StreamMustBeRunning
+
+  /**
+   * Adds the given data to the stream.  Subsuquent check answers will 
block until this data has
+   * been processed.
+   */
+  object AddData {
+def apply[A](source: MemoryStream[A], data: A*): AddDataMemory[A] =
+  AddDataMemory(source, data)
+  }
+
+  /** A trait that can be extended when testing other sources. */
+  trait AddData extends StreamAction {
+def source: Source
+
+/**
+ * Called to trigger adding the data.  Should return the offset that 
will denote when this
+ * new data has been processed.
+ */
+def addData(): Offset
+  }
+
+  case class AddDataMemory[A](source: MemoryStream[A], data: Seq[A]) 
extends AddData {
+override def toString: String = s"AddData to $source: 
${data.mkString(",")}"
+
+override def addData(): Offset = {
+  source.addData(data)
+}
+  }
+
+  /**
+   * Checks to make sure that the current data stored in the sink matches 
the `expectedAnswer`.
+   * This operation automatically blocks untill all added data has been 
processed.
+   */
+  object CheckAnswer {
+def apply[A : Encoder](data: A*): CheckAnswerRows = {
+  val encoder = encoderFor[A]
+  val toExternalRow = RowEncoder(encoder.schema)
+  CheckAnswerRows(data.map(d => 
toExternalRow.fromRow(encoder.toRow(d
+}
+
+def apply(rows: Row*):

[GitHub] spark pull request: [test-maven] Shade protobuf-java

2016-02-01 Thread tedyu

Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/10995#issuecomment-178202908
  
This would still benefit Spark standalone and Spark on Mesos, right ?

For Spark on YARN, status quo is maintained.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10820][SQL] Support for the continuous ...

2016-02-01 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11006#issuecomment-178205092
  
Looks great! Just some minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 676 matches

Mail list logo