date:20161017

[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15376#discussion_r83707763
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -246,7 +247,28 @@ case class LoadDataCommand(
 val loadPath =
   if (isLocal) {
 val uri = Utils.resolveURI(path)
-if (!new File(uri.getPath()).exists()) {
+val filePath = uri.getPath()
+val exists = if (filePath.contains("*")) {
+  val fileSystem = FileSystems.getDefault
+  val pathPattern = fileSystem.getPath(filePath)
+  val dir = pathPattern.getParent.toString
+  val filePattern = pathPattern.getName(pathPattern.getNameCount - 
1).toString
--- End diff --

Thanks. I'll use that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15376#discussion_r83707938
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1886,6 +1887,37 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("SPARK-17796 Support wildcard character in filename for LOAD DATA 
LOCAL INPATH") {
+withTempDir { dir =>
+  for (i <- 1 to 3) {
+val writer = new PrintWriter(new File(s"$dir/part-r-$i"))
--- End diff --

Sure, I'll use Guava one here, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15316
  
@gatorsmile I cannot merge this 2.0. Can you open a backport for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15376#discussion_r83710046
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -246,7 +247,28 @@ case class LoadDataCommand(
 val loadPath =
   if (isLocal) {
 val uri = Utils.resolveURI(path)
-if (!new File(uri.getPath()).exists()) {
+val filePath = uri.getPath()
+val exists = if (filePath.contains("*")) {
+  val fileSystem = FileSystems.getDefault
+  val pathPattern = fileSystem.getPath(filePath)
+  val dir = pathPattern.getParent.toString
+  val filePattern = pathPattern.getName(pathPattern.getNameCount - 
1).toString
+  if (dir.contains("*")) {
+throw new AnalysisException(
+  s"LOAD DATA input path allows only filename wildcard: $path")
+  }
+
+  val files = new File(dir).listFiles()
+  if (files == null) {
+false
+  } else {
+val matcher = fileSystem.getPathMatcher("glob:" + filePattern)
--- End diff --

Yes. It matches the whole absolute path.
```scala
scala> val fs = java.nio.file.FileSystems.getDefault
fs: java.nio.file.FileSystem = sun.nio.fs.MacOSXFileSystem@782dc5

scala> fs.getPathMatcher("glob:/x/1.dat").matches(fs.getPath("/x/1.dat"))
res0: Boolean = true

scala> fs.getPathMatcher("glob:/x/*.dat").matches(fs.getPath("/x/1.dat"))
res1: Boolean = true
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15493: [SPARK-17946][PYSPARK] Python crossJoin API similar to S...

2016-10-17 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15493
  
Why not just introduce a crossJoin function in R, similar to 
Python/Scala/Java?

We don't want to change the default join type, because it is still valid to 
run an inner join by specifying a predicate later using the filter operator.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-17 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/15335#discussion_r83710471
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1255,27 +1255,46 @@ class DAGScheduler(
   s"longer running")
   }
 
-  if (disallowStageRetryForTest) {
-abortStage(failedStage, "Fetch failure will not retry stage 
due to testing config",
-  None)
-  } else if 
(failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) {
-abortStage(failedStage, s"$failedStage (${failedStage.name}) " 
+
-  s"has failed the maximum allowable number of " +
-  s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " +
-  s"Most recent failure reason: ${failureMessage}", None)
-  } else {
-if (failedStages.isEmpty) {
-  // Don't schedule an event to resubmit failed stages if 
failed isn't empty, because
-  // in that case the event will already have been scheduled.
-  // TODO: Cancel running tasks in the stage
-  logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " +
-s"$failedStage (${failedStage.name}) due to fetch failure")
-  messageScheduler.schedule(new Runnable {
-override def run(): Unit = 
eventProcessLoop.post(ResubmitFailedStages)
-  }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS)
+  val shouldAbortStage =
+failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) ||
+disallowStageRetryForTest
+
+  if (shouldAbortStage) {
+val abortMessage = if (disallowStageRetryForTest) {
+  "Fetch failure will not retry stage due to testing config"
+} else {
+  s"""$failedStage (${failedStage.name})
+ |has failed the maximum allowable number of
+ |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}.
+ |Most recent failure reason: 
$failureMessage""".stripMargin.replaceAll("\n", " ")
 }
+abortStage(failedStage, abortMessage, None)
+  } else { // update failedStages and make sure a 
ResubmitFailedStages event is enqueued
+// TODO: Cancel running tasks in the failed stage -- cf. 
SPARK-17064
+val noResubmitEnqueued = !failedStages.contains(failedStage)
--- End diff --

Ok, but it's really not that complicated or difficult to understand.

There is only one way to add stages to `failedStages`: within the 
`FetchFailed` case.  When a `failedStage` is added to `failedStages`, it is 
always accompanied by the parent `mapStage`.

There are only two ways to remove stages from `failedStages`: 1) within the 
handling of a `ResubmitFailedStages` event, when the entire `failedStages` is 
cleared; 2) within `cleanupStateForJobAndIndependentStages` when we call 
`removeStage`.  Obviously, 1) can't produce a state where `mapStage` is not in 
`failedStage` while a corresponding `failedStage` is, so the only logic we need 
to concern ourselves with is in 2).

In order for 2) to produce a state where `mapStage` is absent from 
`failedStages` while an associated `failedStage` is present, `removeStage` 
would need to have been called on the `mapStage` while not being called on the 
`failedStage`.  But that can't happen because `removeStage` will not be called 
on a stage unless no Job needs that stage anymore.  If no job needs the 
`mapStage`, then no job can need a `failedStage` that uses the output of that 
`mapStage` -- i.e. it is not possible that a `mapStage` will be removed in 
`cleanupStateForJobAndIndependentStages` unless every associated `failedStage` 
will also be removed.

Conclusion: It is never possible for `mapStage` to be absent from 
`failedStages` at the same time that `failedStages` is present, so the proposed 
`|| !failedStages.contains(mapStage)` condition will never be checked -- it 
would just be unreachable and misleading code.

There also isn't really any need for concern over lack of tests.  There is 
no need to prove correctness of the current code for something that can't 
happen presently, so the only point of such a test would be to guard against 
some future mistaken change making it possible to remove a failed `mapStage` 
while some `failedStage` still needs it.  If that happens, then we've got far 
bigger problems than checking whether we need to issue a new 
`ResubmitFailedStages` event, and checks for that kind of broken removal of 
parents while their children are still depending onthem should be covered in 
the tests of `cleanupStateForJobAndIndependentStages`.


---
If your project is set up for it, you can r

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15417
  
**[Test build #67077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67077/consoleFull)**
 for PR 15417 at commit 
[`005ff36`](https://github.com/apache/spark/commit/005ff36694169000cbabce6ddbbb3081e94b10c5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter `
  * `class TypeCoercionSuite extends AnalysisTest `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67083/consoleFull)**
 for PR 15376 at commit 
[`c74191b`](https://github.com/apache/spark/commit/c74191ba1a3c0867b92953f3320716f93853db56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67077/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15376#discussion_r83712851
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -246,7 +247,28 @@ case class LoadDataCommand(
 val loadPath =
   if (isLocal) {
 val uri = Utils.resolveURI(path)
-if (!new File(uri.getPath()).exists()) {
+val filePath = uri.getPath()
+val exists = if (filePath.contains("*")) {
+  val fileSystem = FileSystems.getDefault
+  val pathPattern = fileSystem.getPath(filePath)
+  val dir = pathPattern.getParent.toString
+  val filePattern = pathPattern.getName(pathPattern.getNameCount - 
1).toString
+  if (dir.contains("*")) {
+throw new AnalysisException(
+  s"LOAD DATA input path allows only filename wildcard: $path")
+  }
+
+  val files = new File(dir).listFiles()
+  if (files == null) {
+false
+  } else {
+val matcher = fileSystem.getPathMatcher("glob:" + filePattern)
--- End diff --

Ah, I think I missed your point. I will update the code to use absolute 
path here, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15472: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-17 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15472#discussion_r83713099
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala
 ---
@@ -84,27 +84,27 @@ object StreamingQueryListener {
* @since 2.0.0
*/
   @Experimental
-  class QueryStarted private[sql](val queryInfo: StreamingQueryInfo) 
extends Event
+  class QueryStarted private[sql](val queryStatus: StreamingQueryStatus) 
extends Event
--- End diff --

Okay. I will do that in a follow up PR after this PR goes in makes 
branch-2.0 consistent with master. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15518: [SPARK-17974] Refactor FileCatalog classes to sim...

2016-10-17 Thread ericl

GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/15518

[SPARK-17974] Refactor FileCatalog classes to simplify the inheritance tree

## What changes were proposed in this pull request?

This renames `BasicFileCatalog => FileCatalog`, combines  
`SessionFileCatalog` with `PartitioningAwareFileCatalog`, and removes the old 
`FileCatalog` trait.

In summary,
```
MetadataLogFileCatalog extends PartitioningAwareFileCatalog
ListingFileCatalog extends PartitioningAwareFileCatalog
PartitioningAwareFileCatalog extends FileCatalog
TableFileCatalog extends FileCatalog
```

cc @cloud-fan @mallman 

## How was this patch tested?

Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark refactor-session-file-catalog

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15518


commit 798a8820c074e17b1eb17ca4446c7eb0969c63c8
Author: Eric Liang 
Date:   2016-10-17T18:30:30Z

move files

commit db8a669bc1c51fef97e231cc5fec4eb9523b9c47
Author: Eric Liang 
Date:   2016-10-17T18:31:42Z

Mon Oct 17 11:31:42 PDT 2016

commit e2d8c1d850bfaa30dd4b598d284f6068044c46cd
Author: Eric Liang 
Date:   2016-10-17T19:05:47Z

clean up classes

commit 5d390c97f61af4d84c194344d8dd4c86fb41eb32
Author: Eric Liang 
Date:   2016-10-17T19:08:28Z

merge suites




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15515: [SPARK-17970][SQL][WIP] store partition spec in metastor...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15515
  
**[Test build #67076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67076/consoleFull)**
 for PR 15515 at commit 
[`ceac57b`](https://github.com/apache/spark/commit/ceac57b0aa72b6f29de63e61eb4c5073b98243a0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15472: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-17 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15472#discussion_r83713217
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -77,6 +77,9 @@ trait StateStore {
*/
   def updates(): Iterator[StoreUpdate]
 
+  /** Number of keys in the state store */
+  def numKeys(): Long
--- End diff --

This is in an internal API, so doesnt really matter. I can change it in the 
follow up PR 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15515: [SPARK-17970][SQL][WIP] store partition spec in metastor...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15515
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15515: [SPARK-17970][SQL][WIP] store partition spec in metastor...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15515
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67076/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15518: [SPARK-17974] Refactor FileCatalog classes to simplify t...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15518
  
**[Test build #67084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67084/consoleFull)**
 for PR 15518 at commit 
[`5d390c9`](https://github.com/apache/spark/commit/5d390c97f61af4d84c194344d8dd4c86fb41eb32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67085/consoleFull)**
 for PR 15376 at commit 
[`933ad85`](https://github.com/apache/spark/commit/933ad856c7fb3712a39557e09da4bc22b75b905c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15472: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-17 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15472
  
@rxin @marmbrus I am think of merging this PR (with the ignored flaky test) 
to make it consistent with the master. So that follow up PRs (flaky test fix, 
further API changes) can be merged on both master and branch 2.0. Any 
objections?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15472: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15472
  
**[Test build #67086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67086/consoleFull)**
 for PR 15472 at commit 
[`1a32b39`](https://github.com/apache/spark/commit/1a32b396a3ea928ed3b6882aad50dce59ccf7c47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-17 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15497
  
I thought about it, and I still dont like this design. This is adding more 
complexity in a general class ManualClock, for functionality needed only by 
StreamExecution. And that leads to these sort of question - should the general 
feature like `isThreadWaiting` work with multiple threads, etc. 

I think we need to do it differently. I think its best to create a custom 
ManualClock for StreamExecution, which adds the functionality necessary for 
StreamExecution.

Mind if I take over this PR and work this out (in the interest of time, 
2.0.2 cutoff is imminent)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15513: [WIP][SPARK-17963][SQL][Documentation] Add examples (ext...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15513
  
**[Test build #67071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67071/consoleFull)**
 for PR 15513 at commit 
[`2059374`](https://github.com/apache/spark/commit/2059374537496c9f81512b643e3ec084e43e2594).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Skewness(expr: Expression) extends CentralMomentAgg(expr) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15316
  
Sure, will do it soon. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15513: [WIP][SPARK-17963][SQL][Documentation] Add examples (ext...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67071/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67085/consoleFull)**
 for PR 15376 at commit 
[`933ad85`](https://github.com/apache/spark/commit/933ad856c7fb3712a39557e09da4bc22b75b905c).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67085/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15513: [WIP][SPARK-17963][SQL][Documentation] Add examples (ext...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15513
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/9
  
I agree that saving the initialModel may not be practical - since it can be 
large. However, not saving that param at all also seems a bit contrary to me. 
When we produce a model from an estimator, we copy over the params that were 
used to create the model. These params give an indication to how the model was 
created. If we completely disregard the initialModel when we save the model, 
then it will appear as though the model was not created with an initialModel. 
In fact for kmeans, it would look like the model was created using the 
`k-means||` initialization strategy since that is the default. This is 
misleading.

It would be nice to have a way to avoid saving the model with the initial 
model data, but still preserve the information about how the model was 
initialized. You can argue even, that the initialModel should not be a param at 
all because of the edge cases it seems to introduce. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r83719858
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -81,6 +81,13 @@ private[clustering] trait KMeansParams extends Params 
with HasMaxIter with HasFe
   def getInitSteps: Int = $(initSteps)
 
   /**
+   * Param for KMeansModel to use for warm start.
--- End diff --

Actually, we also need to be clear that the `initMode` is entirely ignored 
when setting an initial model. Let's add it to the doc of the `initMode` param


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15517: [SPARK-17972][SQL] Cache analyzed plan instead of optimi...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15517
  
**[Test build #67081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67081/consoleFull)**
 for PR 15517 at commit 
[`292ef36`](https://github.com/apache/spark/commit/292ef36a363ee4b2e0eac6e6686fe33c9b962120).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15517: [SPARK-17972][SQL] Cache analyzed plan instead of optimi...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15517
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15517: [SPARK-17972][SQL] Cache analyzed plan instead of optimi...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15517
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67081/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15481
  
**[Test build #67067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67067/consoleFull)**
 for PR 15481 at commit 
[`2997ccb`](https://github.com/apache/spark/commit/2997ccb25dd1bb7dfcef44054f91d5d1132cd686).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15512
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67065/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15512
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15481
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67067/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15481
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15497
  
**[Test build #67074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67074/consoleFull)**
 for PR 15497 at commit 
[`7ae7782`](https://github.com/apache/spark/commit/7ae7782cdede0c3f2a3db0a09401cf0d682a264f).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15302
  
With today's master, it's like the following. Should we use expression in 
`AlterTableDropPartitionCommand`?
```scala
org.apache.spark.sql.AnalysisException: cannot resolve '`country`' given 
input columns: []; line 1 pos 23;
'AlterTableDropPartitionCommand `sales`, [('country < KR)], false, false
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15512
  
**[Test build #67065 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67065/consoleFull)**
 for PR 15512 at commit 
[`037871d`](https://github.com/apache/spark/commit/037871d8843760fbbdeab344d8228bfaeba6f6ae).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15497
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67074/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15497
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15471
  
**[Test build #67079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67079/consoleFull)**
 for PR 15471 at commit 
[`6f15a15`](https://github.com/apache/spark/commit/6f15a1541f01429ae19237252c600b108722ecb4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15471
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15471
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67079/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15410
  
**[Test build #67070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67070/consoleFull)**
 for PR 15410 at commit 
[`b43e241`](https://github.com/apache/spark/commit/b43e2412444f8da29f04ecab16a4955db1f8b35f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15410
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15410
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67070/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14650: [SPARK-17062][MESOS] add conf option to mesos dispatcher

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14650
  
**[Test build #67066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67066/consoleFull)**
 for PR 14650 at commit 
[`c322c27`](https://github.com/apache/spark/commit/c322c276ecbed352ea4dbf32df5b6f0d1f2c4347).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #67069 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67069/consoleFull)**
 for PR 11105 at commit 
[`007b52b`](https://github.com/apache/spark/commit/007b52bcf46b6701903c28a07b1b3a14b5f962f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14650: [SPARK-17062][MESOS] add conf option to mesos dispatcher

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67066/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14847
  
**[Test build #67073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67073/consoleFull)**
 for PR 14847 at commit 
[`141dc51`](https://github.com/apache/spark/commit/141dc5152ed43563cee13d65e5f7d5d2e262d6f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class FilterExecBase extends UnaryExecNode with 
CodegenSupport with PredicateHelper `
  * `case class FilterExec(val condition: Expression, val child: SparkPlan)`
  * `case class StopAfterExec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67069/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14650: [SPARK-17062][MESOS] add conf option to mesos dispatcher

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14650
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67073/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14847
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #67068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67068/consoleFull)**
 for PR 11105 at commit 
[`e027d53`](https://github.com/apache/spark/commit/e027d5375474143cd0e44e3fba40d0bb29981eb1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-17 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15421
  
Thanks @wangmiao1981 - There are two different kinds of serializations that 
happen in SparkR - one is the RPC style serialization where function arguments 
are serialized using `writeDate`, `writeInt` etc. The other is batch or bulk 
serialization that we use in case of converting R `data.frame` to Spark RDDs. 
This is used in the `createDataFrame` case from [1].

Now the way this is supposed to work is that this is converted by the call 
to `lapply` and `getJRDD` [2] to be a row-wise serialized `SparkDataFrame`. To 
do this on the executor side you will have a `unserialize` called on the bulk 
data  [3] and a `writeRowSerialize` called for each row [4]. So the final byte 
stream to look at is the one here. But my guess is that things are going wrong 
somewhere before this -- i.e. the byte stream at [3] for example has some 
different type or something like that. Or to put it another way, are we sure 
`writeString` was called with `NA` or was it some other function like 
`writeBin` because the types were wrong ?

The other reason for such a transient bug might be that the channels are 
not getting flushed somewhere and this doesn't show up on some R versions. But 
yeah your debugging methods are in line with what I would try

[1] 
https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/R/context.R#L140
[2] 
https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/R/SQLContext.R#L275
[3] 
https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/inst/worker/worker.R#L159
[4] 
https://github.com/apache/spark/blob/d88a1bae6a9c975c39549ec2326d839ea93949b2/R/pkg/inst/worker/worker.R#L78


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #67082 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67082/consoleFull)**
 for PR 11105 at commit 
[`1490dd0`](https://github.com/apache/spark/commit/1490dd0293173604c4727d17d1d2b1cc97fc7ca0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15514: [SPARK-17960][PySpark] [Upgrade to Py4J 0.10.4]

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15514
  
**[Test build #3356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3356/consoleFull)**
 for PR 15514 at commit 
[`70fa455`](https://github.com/apache/spark/commit/70fa4555a75ad48676a3037c79a1bdb9230c3598).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67082/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/9
  
Good points. Perhaps a solution - while slightly "verbose", is to introduce 
another param `initialModelWriteMode` which governs what is saved - `full`, 
`params` or `none`. Full is obviously the entire model, `params` is the 
metadata only without the data, while `none` is not saved at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15520: [SPARK-13747][SQL]Fix concurrent executions in Fo...

2016-10-17 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/15520

[SPARK-13747][SQL]Fix concurrent executions in ForkJoinPool for SQL

## What changes were proposed in this pull request?

Calling `Await.result` will allow other tasks to be run on the same thread 
when using ForkJoinPool. However, SQL uses a `ThreadLocal` execution id to 
trace Spark jobs launched by a query, which doesn't work perfectly in 
ForkJoinPool.

This PR just uses `Awaitable.result` instead to  prevent ForkJoinPool from 
running other tasks in the current waiting thread.

## How was this patch tested?

Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-13747

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15520


commit 54078ce1e33a05c740840e74f37834934f085d79
Author: Shixiong Zhu 
Date:   2016-10-17T20:57:10Z

Fix concurrent executions in ForkJoinPool for SQL




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15519: [SQL][STREAMING][TEST] Fix flaky tests in Streami...

2016-10-17 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/15519

[SQL][STREAMING][TEST] Fix flaky tests in StreamingQueryListenerSuite

This work has largely been done by @lw-lin in his PR #15497. This is a 
slightly refactoring of it.

## What changes were proposed in this pull request?
There were two sources of flakiness in StreamingQueryListener test.

- When testing with manual clock, consecutive attempts to advance the clock 
can occur without the stream execution thread being unblocked and doing some 
work between the two attempts. Hence the following can happen with the current 
ManualClock.
```
+---++
|  StreamExecution thread   | testing thread |
+---++
|  ManualClock.waitTillTime(100) {  ||
|_isWaiting = true  ||
|wait(10)   ||
|still in wait(10)  |  if (_isWaiting) advance(100)  |
|still in wait(10)  |  if (_isWaiting) advance(200)  | <- 
this should be disallowed !
|still in wait(10)  |  if (_isWaiting) advance(300)  | <- 
this should be disallowed !
|  wake up from wait(10)||
|   current time is 600 ||
|   _isWaiting = false  ||
|  }||
+---++
```

- Second source of flakiness is that the adding data to memory stream may 
get processing in any trigger, not just the first trigger.


My fix is to make the manual clock wait for the other stream execution 
thread to start waiting for the clock at the right wait start time. That is, 
`advance(300)` (see above) will wait for stream execution thread to complete 
the wait that started at time 100, and start a new wait at time 300 (i.e. time 
stamp after the previous `advance(200)`).

In addition, since this is a feature that is solely used by 
StreamExecution, I removed all the non-generic code from ManualClock and put 
them in StreamManualClock inside StreamTest.


## How was this patch tested?
Ran existing unit test MANY TIME in Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark metrics-flaky-test-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15519


commit 5bc47b639ede049f44ad4f47a88d26219fea6193
Author: Liwei Lin 
Date:   2016-10-15T02:21:58Z

Fix flaky test

commit eb59a98146f30163675cec3b52f69fedd7a234fc
Author: Liwei Lin 
Date:   2016-10-17T13:15:40Z

Revert "Fix flaky test"

This reverts commit 5bc47b639ede049f44ad4f47a88d26219fea6193.

commit 7ae7782cdede0c3f2a3db0a09401cf0d682a264f
Author: Liwei Lin 
Date:   2016-10-17T11:53:46Z

Fix flaky test again

commit 6fdbae34a6e806ad0ca8bb6cfd6ff630e0b84143
Author: Tathagata Das 
Date:   2016-10-17T20:46:48Z

Refactored Manual clock




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15519: [SQL][STREAMING][TEST] Fix flaky tests in StreamingQuery...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15519
  
**[Test build #67087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67087/consoleFull)**
 for PR 15519 at commit 
[`6fdbae3`](https://github.com/apache/spark/commit/6fdbae34a6e806ad0ca8bb6cfd6ff630e0b84143).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67068/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15520: [SPARK-13747][SQL]Fix concurrent executions in ForkJoinP...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15520
  
**[Test build #67088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67088/consoleFull)**
 for PR 15520 at commit 
[`54078ce`](https://github.com/apache/spark/commit/54078ce1e33a05c740840e74f37834934f085d79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15492: [DO NOT MERGE][TEST] Testing flakiness of StreamingQuery...

2016-10-17 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15492
  
I am closing this. I opened a new PR for more testing of a fix - 
https://github.com/apache/spark/pull/15497



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15472: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15472
  
**[Test build #67086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67086/consoleFull)**
 for PR 15472 at commit 
[`1a32b39`](https://github.com/apache/spark/commit/1a32b396a3ea928ed3b6882aad50dce59ccf7c47).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15472: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15472
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67086/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15472: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15472
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15490: [SPARK-10541] [Web UI] Allow ApplicationHistoryProviders...

2016-10-17 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/15490
  
@steveloughran Since you opened the JIRA do you mind taking a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-17 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15421
  
@shivaram  Thanks for your explanation! I can continue debugging this as 
you pointed and I can constantly reproduce the issue. For this PR, I think it 
is fine for handling the `NA` in the backend except for the unnecessary 
exception handling. I can submit a follow up PR on the serialization part. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15518: [SPARK-17974] Refactor FileCatalog classes to sim...

2016-10-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15518#discussion_r83737706
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileCatalog.scala
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.hadoop.fs._
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * A collection of data files from a partitioned relation, along with the 
partition values in the
+ * form of an [[InternalRow]].
+ */
+case class Partition(values: InternalRow, files: Seq[FileStatus])
+
+/**
+ * An interface for objects capable of enumerating the root paths of a 
relation as well as the
+ * partitions of a relation subject to some pruning expressions.
+ */
+trait FileCatalog {
+
+  /**
+   * Returns the list of root input paths from which the catalog will get 
files. There may be a
+   * single root path from which partitions are discovered, or individual 
partitions may be
+   * specified by each path.
+   */
+  def rootPaths: Seq[Path]
--- End diff --

what's "root" about this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67083 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67083/consoleFull)**
 for PR 15376 at commit 
[`c74191b`](https://github.com/apache/spark/commit/c74191ba1a3c0867b92953f3320716f93853db56).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15376
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67083/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67089/consoleFull)**
 for PR 15376 at commit 
[`933ad85`](https://github.com/apache/spark/commit/933ad856c7fb3712a39557e09da4bc22b75b905c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15495: [SPARK-17620][SQL] Determine Serde by hive.default.filef...

2016-10-17 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15495
  
@gatorsmile If this pr fixes the problem related to the build, I am fine to 
merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15518: [SPARK-17974] Refactor FileCatalog classes to simplify t...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15518
  
**[Test build #67084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67084/consoleFull)**
 for PR 15518 at commit 
[`5d390c9`](https://github.com/apache/spark/commit/5d390c97f61af4d84c194344d8dd4c86fb41eb32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15518: [SPARK-17974] Refactor FileCatalog classes to simplify t...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15518
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15518: [SPARK-17974] Refactor FileCatalog classes to simplify t...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67084/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15495: [SPARK-17620][SQL] Determine Serde by hive.default.filef...

2016-10-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15495
  
Thank you! Will do it soon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15518: [SPARK-17974] Refactor FileCatalog classes to sim...

2016-10-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15518#discussion_r83738846
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileCatalog.scala
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.hadoop.fs._
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * A collection of data files from a partitioned relation, along with the 
partition values in the
+ * form of an [[InternalRow]].
+ */
+case class Partition(values: InternalRow, files: Seq[FileStatus])
--- End diff --

while you are doing it, perhaps we can rename this TaskPartition? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15518: [SPARK-17974] Refactor FileCatalog classes to sim...

2016-10-17 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/15518#discussion_r83738938
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileCatalog.scala
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.hadoop.fs._
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * A collection of data files from a partitioned relation, along with the 
partition values in the
+ * form of an [[InternalRow]].
+ */
+case class Partition(values: InternalRow, files: Seq[FileStatus])
+
+/**
+ * An interface for objects capable of enumerating the root paths of a 
relation as well as the
+ * partitions of a relation subject to some pruning expressions.
+ */
+trait FileCatalog {
+
+  /**
+   * Returns the list of root input paths from which the catalog will get 
files. There may be a
+   * single root path from which partitions are discovered, or individual 
partitions may be
+   * specified by each path.
+   */
+  def rootPaths: Seq[Path]
--- End diff --

I would say "pretty much nothing" anymore.

In an earlier version, it was the "root" path of the table, excluding any 
partition dirs. The PR drifted away from that definition.

Now I'd say it could be reverted to `paths`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15519: [WIP][SQL][STREAMING][TEST] Fix flaky tests in Streaming...

2016-10-17 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15519
  
@lw-lin  please take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-17 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15497
  
I opened a PR after modifying your branch #15519. Since you did the initial 
investigation, I will mark you as the author when I merge it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15519: [WIP][SQL][STREAMING][TEST] Fix flaky tests in Streaming...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15519
  
**[Test build #3358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3358/consoleFull)**
 for PR 15519 at commit 
[`6fdbae3`](https://github.com/apache/spark/commit/6fdbae34a6e806ad0ca8bb6cfd6ff630e0b84143).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15472: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15472
  
**[Test build #3357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3357/consoleFull)**
 for PR 15472 at commit 
[`1a32b39`](https://github.com/apache/spark/commit/1a32b396a3ea928ed3b6882aad50dce59ccf7c47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15518: [SPARK-17974] Refactor FileCatalog classes to sim...

2016-10-17 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/15518#discussion_r83741391
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -626,8 +626,9 @@ class ParquetPartitionDiscoverySuite extends QueryTest 
with ParquetTest with Sha
   (1 to 10).map(i => (i, i.toString)).toDF("a", 
"b").write.parquet(dir.getCanonicalPath)
   val queryExecution = 
spark.read.parquet(dir.getCanonicalPath).queryExecution
   queryExecution.analyzed.collectFirst {
-case LogicalRelation(HadoopFsRelation(location: FileCatalog, _, _, 
_, _, _), _, _) =>
-  assert(location.partitionSpec === PartitionSpec.emptySpec)
+case LogicalRelation(
+HadoopFsRelation(location: PartitioningAwareFileCatalog, _, _, 
_, _, _), _, _) =>
+  assert(location.partitionSpec() === PartitionSpec.emptySpec)
   }.getOrElse {
 fail(s"Expecting a ParquetRelation2, but got:\n$queryExecution")
--- End diff --

We're not expecting a `ParquetRelation2` anymoreâmore like a 
`HadoopFsRelation` with a `PartitioningAwareFileCatalog`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...

2016-10-17 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15450
  
The cases you enumerated are the ones I was thinking of. The changes 
introduced here would alleviate those problems, I agree. What I'm wondering is 
if this problem still exists in other cases. If Derrick had 1.3M data points 
and asked for 10k clusters, and got only 1k unique cluster centers, why did it 
happen? Is it common/possible for clusters that start in different locations to 
converge to the same point? I'd be interested to replicate this issue, not sure 
if I will have the time.

Also, I'm slightly inclined to match scikit or R unless we're certain of a 
clear benefit not to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2016-10-17 Thread kishorvpatil

Github user kishorvpatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r83744591
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -760,7 +787,7 @@ private[spark] class Client(
   .foreach { case (k, v) => 
YarnSparkHadoopUtil.addPathToEnvironment(env, k, v) }
 
 // Keep this for backwards compatibility but users should move to the 
config
-sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
+sysEnvironment.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
--- End diff --

created https://issues.apache.org/jira/browse/SPARK-17979 - Remove 
deprecated support for config 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15521: [SPARK-17980] [SQL] Fix refreshByPath for convert...

2016-10-17 Thread ericl

GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/15521

[SPARK-17980] [SQL] Fix refreshByPath for converted Hive tables

## What changes were proposed in this pull request?

There was a bug introduced in https://github.com/apache/spark/pull/14690 
which broke refreshByPath with converted hive tables (though, it turns out it 
was very difficult to refresh converted hive tables anyways, since you had to 
specify the exact path of one of the partitions).

This changes refreshByPath to invalidate by prefix instead of exact match, 
and fixes the issue.

cc @sameeragarwal for refreshByPath changes
@mallman 

## How was this patch tested?

Extended unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark fix-caching

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15521.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15521


commit 74156663514001d4ab7d6aca6a356284cc2bc019
Author: Eric Liang 
Date:   2016-10-17T22:00:03Z

Mon Oct 17 15:00:03 PDT 2016




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-17 Thread falaki

Github user falaki commented on the issue:

https://github.com/apache/spark/pull/15471
  
@shivaram I think they are unrelated. Can you trigger another test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

201 - 300 of 549 matches

Mail list logo