date:20161006

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15249
  

Thinking more, and based on what @squito mentioned, I was considering the 
following :

Since we are primarily dealing with executor or nodes which are 'bad' as 
opposed to recoverable failures due to resource contention, prevention of 
degenerate corner cases which existing blacklist is for, etc :

Can we assume a successful task execution on a node will imply healthy node 
?
What about at executor level ?

Proposal is to keep the pr as is for the most part, but :
- Clear nodeToExecsWithFailures when an task on an node succeeds. Same for 
nodeToBlacklistedTaskIndexes.
- Not sure if we want to reset execToFailures for an executor (not clearing 
would imply we are handling resource starvation case implicitly imo).
- If possible - allow for speculative tasks to be scheduled on blacklisted 
nodes/executors if it is possible for countTowardsTaskFailures to be overriden 
to false in those cases (if not, ignore this - since it will add towards number 
of failures per app).
 
The rationale behind this is that successful tasks indicate past failures 
were not indicative of bad nodes/executors, but rather transient failures. And 
speculative tasks also sort of work as probe tasks to determine if the 
node/executor has recovered and is healthy.

I hope I am not missing anything - any thoughts @squito, @kayousterhout, 
@tgravescs ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15361
  
Hi @kxepal , I just tested (copied and pasted) the codes below:

```scala
import org.apache.spark.sql.SparkSession
import spark.implicits._

val spark = SparkSession.builder().appName("Spark Hive 
Example").enableHiveSupport().getOrCreate()
val sv = org.apache.spark.mllib.linalg.Vectors.sparse(7, Array(0, 42), 
Array(-127, 128))
val df = Seq(("thing", sv)).toDF("thing", "vector")
df.write.format("orc").save("/tmp/thing.orc")
```

and it seems fine with the current master branch. Do you mind if I try to 
verify this again when be hopefully backport to branch-2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15366
  
**[Test build #66480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66480/consoleFull)**
 for PR 15366 at commit 
[`c1d2b2b`](https://github.com/apache/spark/commit/c1d2b2bd1e1a12791a180f1b753ca082c97df31c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15387
  
**[Test build #66479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66479/consoleFull)**
 for PR 15387 at commit 
[`aca55de`](https://github.com/apache/spark/commit/aca55de0624f5634acb04f91636dce79af875fab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread koeninger

Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15355
  
@zsxwing good eye, thanks.  It's not that auto.offset.reset.earliest 
doesn't work, it's that there's a potential race condition that poll gets 
called twice slowly enough for consumer position to be modified before 
topicpartitions are paused.

https://github.com/apache/spark/pull/15387

should address that.

It's something that whoever works on the duplicated equivalent code in the 
structured streaming module is going to have to address, also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/15366
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66478/consoleFull)**
 for PR 15307 at commit 
[`10d1c24`](https://github.com/apache/spark/commit/10d1c243a71d464ada33db269a30ad0e4dff3ced).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15387
  
**[Test build #66477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66477/consoleFull)**
 for PR 15387 at commit 
[`1fc5863`](https://github.com/apache/spark/commit/1fc5863db88cac9dfd0be09318c4ca8779a51682).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15379: [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when ...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15379
  
+1 for this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race co...

2016-10-06 Thread koeninger

GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/15387

[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice

## What changes were proposed in this pull request?

Kafka consumers can't subscribe or maintain heartbeat without polling, but 
polling ordinarily consumes messages and adjusts position.  We don't want this 
on the driver, so we poll with a timeout of 0 and pause all topicpartitions.

Some consumer strategies that seek to particular positions have to poll 
first, but they weren't pausing immediately thereafter.  Thus, there was a race 
condition where the second poll() in the DStream start method might actually 
adjust consumer position.

Eliminated (or at least drastically reduced the chance of) the race 
condition via pausing in the relevant consumer strategies, and assert on 
startup that no messages were consumed.

## How was this patch tested?

I reliably reproduced the intermittent test failure by inserting a 
thread.sleep directly before returning from SubscribePattern.  The suggested 
fix eliminated the failure.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 SPARK-17782

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15387


commit 1fc5863db88cac9dfd0be09318c4ca8779a51682
Author: cody koeninger 
Date:   2016-10-07T01:08:01Z

[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll being 
called twice and moving position




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15366
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66469/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15379: [SPARK-17805][PYSPARK] Fix in sqlContext.read.tex...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15379#discussion_r82315908
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -289,8 +289,8 @@ def text(self, paths):
 [Row(value=u'hello'), Row(value=u'this')]
 """
 if isinstance(paths, basestring):
-path = [paths]
-return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path)))
+paths = [paths]
+return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))
--- End diff --

This is a super minor but I think it'd be nicer to match up the variable 
name to `path` if this makes sense. For parquet, it takes non-keyword arguments 
so it seems `paths` but for others, it seems take a single argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15366
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15366
  
**[Test build #66469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66469/consoleFull)**
 for PR 15366 at commit 
[`c1d2b2b`](https://github.com/apache/spark/commit/c1d2b2bd1e1a12791a180f1b753ca082c97df31c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66475/consoleFull)**
 for PR 15307 at commit 
[`2918525`](https://github.com/apache/spark/commit/29185254d325834c40bd63a543317950b2794b30).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66475/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15263: [SPARK-14525][SQL][FOLLOWUP] Clean up JdbcRelationProvid...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15263
  
Hi @gatorsmile, would there other things more I should take care of?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15354#discussion_r82314175
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1729,6 +1729,29 @@ def from_json(col, schema, options={}):
 return Column(jc)
 
 
+@ignore_unicode_prefix
+@since(2.1)
+def to_json(col, options={}):
+"""
+Converts a column containing a [[StructType]] into a JSON string. 
Returns `null`,
+in the case of an unsupported type.
+
+:param col: struct column
+:param options: options to control converting. accepts the same 
options as the json datasource
+
+>>> from pyspark.sql import Row
+>>> from pyspark.sql.types import *
+>>> data = [(1, Row(name='Alice', age=2))]
+>>> df = spark.createDataFrame(data, ("key", "value"))
+>>> df.select(to_json(df.value).alias("json")).collect()
+[Row(json=u'{"age":2,"name":"Alice"}')]
+"""
+
+sc = SparkContext._active_spark_context
+jc = sc._jvm.functions.to_json(_to_java_column(col), options)
--- End diff --

@holdenk Thank you for your comment. Could you please a bit elaborate this 
comment? I am a bit not sure on what to fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15354#discussion_r82313950
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1729,6 +1729,29 @@ def from_json(col, schema, options={}):
 return Column(jc)
 
 
+@ignore_unicode_prefix
+@since(2.1)
+def to_json(col, options={}):
+"""
+Converts a column containing a [[StructType]] into a JSON string. 
Returns `null`,
+in the case of an unsupported type.
+
+:param col: struct column
--- End diff --

Sure, let me try to double check other comments as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15367: [SPARK-17346][SQL][test-maven]Add Kafka source fo...

2016-10-06 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15367#discussion_r82313748
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
 ---
@@ -0,0 +1,425 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.util.Random
+
+import org.apache.kafka.clients.producer.RecordMetadata
+import org.scalatest.BeforeAndAfter
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+
+abstract class KafkaSourceTest extends StreamTest with SharedSQLContext {
+
+  protected var testUtils: KafkaTestUtils = _
+
+  override val streamingTimeout = 30.seconds
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+testUtils = new KafkaTestUtils
+testUtils.setup()
+  }
+
+  override def afterAll(): Unit = {
+if (testUtils != null) {
+  testUtils.teardown()
+  testUtils = null
+  super.afterAll()
+}
+  }
+
+  protected def makeSureGetOffsetCalled = AssertOnQuery { q =>
+// Because KafkaSource's initialPartitionOffsets is set lazily, we 
need to make sure
+// its "getOffset" is called before pushing any data. Otherwise, 
because of the race contion,
+// we don't know which data should be fetched when `startingOffset` is 
latest.
+q.processAllAvailable()
+true
+  }
+
+  /**
+   * Add data to Kafka.
+   *
+   * `topicAction` can be used to run actions for each topic before 
inserting data.
+   */
+  case class AddKafkaData(topics: Set[String], data: Int*)
+(implicit ensureDataInMultiplePartition: Boolean = false,
+  concurrent: Boolean = false,
+  message: String = "",
+  topicAction: (String, Option[Int]) => Unit = (_, _) => {}) extends 
AddData {
+
+override def addData(query: Option[StreamExecution]): (Source, Offset) 
= {
+  if (query.get.isActive) {
+// Make sure no Spark job is running when deleting a topic
+query.get.processAllAvailable()
+  }
+
+  val existingTopics = testUtils.getAllTopicsAndPartitionSize().toMap
+  val newTopics = topics.diff(existingTopics.keySet)
+  for (newTopic <- newTopics) {
+topicAction(newTopic, None)
+  }
+  for (existingTopicPartitions <- existingTopics) {
+topicAction(existingTopicPartitions._1, 
Some(existingTopicPartitions._2))
+  }
+
+  // Read all topics again in case some topics are delete.
+  val allTopics = testUtils.getAllTopicsAndPartitionSize().toMap.keys
+  require(
+query.nonEmpty,
+"Cannot add data when there is no query for finding the active 
kafka source")
+
+  val sources = query.get.logicalPlan.collect {
+case StreamingExecutionRelation(source, _) if 
source.isInstanceOf[KafkaSource] =>
+  source.asInstanceOf[KafkaSource]
+  }
+  if (sources.isEmpty) {
+throw new Exception(
+  "Could not find Kafka source in the StreamExecution logical plan 
to add data to")
+  } else if (sources.size > 1) {
+throw new Exception(
+  "Could not select the Kafka source in the StreamExecution 
logical plan as there" +
+"are multiple Kafka sources:\n\t" + sources.mkString("\n\t"))
+  }
+  val kafkaSource = sources.head
+  val topic = topics.toSeq(Random.nextInt(topics.size))
+  val sentMetadata = testUtils.sendMessages(topic, data.map { 
_.toString }.toArray)
+
+  def metadataToStr(m: (String, RecordMetadata)): String = {
+s"Sent ${m._1} to partition ${m._2.partition()}, offset 
${m._2.offset()}"

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15367
  
> @zsxwing Have you tested the maven build?



> Test build #66433 has finished for PR 15367 at commit 0915826.
> 
> This patch passes all tests.
> This patch merges cleanly.
> This patch adds no public classes.

This one is maven build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15292
  
@gatorsmile Sure, let me just wait and sweep it once!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11601
  
**[Test build #66476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66476/consoleFull)**
 for PR 11601 at commit 
[`91d4cee`](https://github.com/apache/spark/commit/91d4cee75a150ad2335dba0838c47cb4f0505ad8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread seyfe

Github user seyfe commented on the issue:

https://github.com/apache/spark/pull/15371
  
I also want to point out that below is the core part of fix. Rest of the 
code changes are side-effects of it.

```
+  // `asScala` accesses the internal values using `java.util.Iterator` so 
needs to be synchronized
 +  override def value: List[(BlockId, BlockStatus)] = {
 +_seq.synchronized {
 +  _seq.asScala.toList
 +}
 +  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15367
  
@zsxwing Have you test the maven build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15367: [SPARK-17346][SQL][test-maven]Add Kafka source fo...

2016-10-06 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15367#discussion_r82313116
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
 ---
@@ -0,0 +1,425 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.util.Random
+
+import org.apache.kafka.clients.producer.RecordMetadata
+import org.scalatest.BeforeAndAfter
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+
+abstract class KafkaSourceTest extends StreamTest with SharedSQLContext {
+
+  protected var testUtils: KafkaTestUtils = _
+
+  override val streamingTimeout = 30.seconds
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+testUtils = new KafkaTestUtils
+testUtils.setup()
+  }
+
+  override def afterAll(): Unit = {
+if (testUtils != null) {
+  testUtils.teardown()
+  testUtils = null
+  super.afterAll()
+}
+  }
+
+  protected def makeSureGetOffsetCalled = AssertOnQuery { q =>
+// Because KafkaSource's initialPartitionOffsets is set lazily, we 
need to make sure
+// its "getOffset" is called before pushing any data. Otherwise, 
because of the race contion,
+// we don't know which data should be fetched when `startingOffset` is 
latest.
+q.processAllAvailable()
+true
+  }
+
+  /**
+   * Add data to Kafka.
+   *
+   * `topicAction` can be used to run actions for each topic before 
inserting data.
+   */
+  case class AddKafkaData(topics: Set[String], data: Int*)
+(implicit ensureDataInMultiplePartition: Boolean = false,
+  concurrent: Boolean = false,
+  message: String = "",
+  topicAction: (String, Option[Int]) => Unit = (_, _) => {}) extends 
AddData {
+
+override def addData(query: Option[StreamExecution]): (Source, Offset) 
= {
+  if (query.get.isActive) {
+// Make sure no Spark job is running when deleting a topic
+query.get.processAllAvailable()
+  }
+
+  val existingTopics = testUtils.getAllTopicsAndPartitionSize().toMap
+  val newTopics = topics.diff(existingTopics.keySet)
+  for (newTopic <- newTopics) {
+topicAction(newTopic, None)
+  }
+  for (existingTopicPartitions <- existingTopics) {
+topicAction(existingTopicPartitions._1, 
Some(existingTopicPartitions._2))
+  }
+
+  // Read all topics again in case some topics are delete.
+  val allTopics = testUtils.getAllTopicsAndPartitionSize().toMap.keys
+  require(
+query.nonEmpty,
+"Cannot add data when there is no query for finding the active 
kafka source")
+
+  val sources = query.get.logicalPlan.collect {
+case StreamingExecutionRelation(source, _) if 
source.isInstanceOf[KafkaSource] =>
+  source.asInstanceOf[KafkaSource]
+  }
+  if (sources.isEmpty) {
+throw new Exception(
+  "Could not find Kafka source in the StreamExecution logical plan 
to add data to")
+  } else if (sources.size > 1) {
+throw new Exception(
+  "Could not select the Kafka source in the StreamExecution 
logical plan as there" +
+"are multiple Kafka sources:\n\t" + sources.mkString("\n\t"))
+  }
+  val kafkaSource = sources.head
+  val topic = topics.toSeq(Random.nextInt(topics.size))
+  val sentMetadata = testUtils.sendMessages(topic, data.map { 
_.toString }.toArray)
+
+  def metadataToStr(m: (String, RecordMetadata)): String = {
+s"Sent ${m._1} to partition ${m._2.partition()}, offset 
${m._2.offset()}"

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66475/consoleFull)**
 for PR 15307 at commit 
[`2918525`](https://github.com/apache/spark/commit/29185254d325834c40bd63a543317950b2794b30).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15384
  
Correction. Merging to master only. Can you fix the 2.0 PR with this issue 
since that is still open?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15384
  
LGTM. Merging it to master and 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15386: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15386
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15386: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15386
  
**[Test build #66474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66474/consoleFull)**
 for PR 15386 at commit 
[`a3468f4`](https://github.com/apache/spark/commit/a3468f45169a9a072d6ae2d730a7aeb062339ec1).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15386: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15386
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66474/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14215
  
@wgtmac BTW, as you might already know, my plan and though is, to implement 
each first and then unify them within a common parent at the end if possible 
and it makes sense..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15386: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15386
  
**[Test build #66474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66474/consoleFull)**
 for PR 15386 at commit 
[`a3468f4`](https://github.com/apache/spark/commit/a3468f45169a9a072d6ae2d730a7aeb062339ec1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread seyfe

Github user seyfe commented on the issue:

https://github.com/apache/spark/pull/15371
  
@zsxwing . `JArray(v.asInstanceOf[java.util.List[(BlockId, 
BlockStatus)]].asScala.toList.map` this is the line of code which is causing 
ConcurrentModificationException. And you can see that from call stack. This is 
coming from BlockStatusesAccumulator. Am I missing something here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-10-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14215
  
@wgtmac Thanks for pinging. I think I can proceed this on this weekend. I 
haven't looked into vectorized one closely yet. If you have already looked into 
that, I think it'd also make sense not to deal with the vectorized one but in 
another PR you might open.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15386: [SPARK-17808][PYSPARK] Upgraded version of Pyroli...

2016-10-06 Thread BryanCutler

GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/15386

[SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13

## What changes were proposed in this pull request?
Upgraded to a newer version of Pyrolite which supports serialization of a 
BinaryType StructField for PySpark.SQL


## How was this patch tested?
Added a unit test which fails with a raised ValueError when using the 
previous version of Pyrolite 4.9 and Python3

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark pyrolite-upgrade-SPARK-17808

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15386


commit a3468f45169a9a072d6ae2d730a7aeb062339ec1
Author: Bryan Cutler 
Date:   2016-10-07T00:24:11Z

upgraded version of Pyrolite to 4.13 which supports serialization of 
BinaryType StructField




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15371
  




> @zsxwing. BlockStatusesAccumulator is using synchronizedList which is 
_seq for concurrency. But when the value method is executed and the output is 
used by asScala.toList it caused the ConcurrentModificationException.
> 
> Collections.synchronizedList suggests using synchronized keyword when 
using its iterator which is done by asScala method. Does that make sense?

@seyfe The comment you are deleting explains why it's safe: the driver 
doesn't modify `BlockStatusesAccumulator`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15249
  
If I understood the change correctly,  a node can get blacklisted for a
taskset if sufficient (even different) tasks fail on executers on it.
Which can potentially cause all nodes to be blacklisted.

Or do you think this is contrived scenario that can't occur in practice?  I
don't have sufficient context for motivating usecases/scenarios for this
geature.

On Oct 6, 2016 3:54 PM, "Kay Ousterhout"  wrote:

> @mridulm  re: job failures, can you elaborate
> on the job failure scenario you're concerned about?
>
> Jobs can only fail when some tasks are unschedulable, which can happen if
> a task is permanently blacklisted on all available nodes. This can only
> happen when the number of nodes is smaller than the maximum number of
> failures for a particular tax attempt, and also seems like it's very
> similar to existing behavior: currently, if a task is blacklisted (even
> though the blacklist is temporary) on all nodes, the job will be failed (
> https://github.com/apache/spark/blob/master/core/src/
> main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L595).
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread seyfe

Github user seyfe commented on the issue:

https://github.com/apache/spark/pull/15371
  
@zsxwing. `BlockStatusesAccumulator` is using synchronizedList which is 
`_seq` for concurrency. But when the `value` method is executed and the output 
is used by `asScala.toList` it caused the `ConcurrentModificationException`. 

`Collections.synchronizedList` suggests using synchronized keyword when 
using its iterator which is done by asScala method. Does that make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-10-06 Thread wgtmac

Github user wgtmac commented on the issue:

https://github.com/apache/spark/pull/14215
  
@HyukjinKwon Do you have a timeline for this patch? 
Also, what's your plan on vectorized parquet reader? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15384
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66471/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15384
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15384
  
**[Test build #66471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66471/consoleFull)**
 for PR 15384 at commit 
[`c7950f0`](https://github.com/apache/spark/commit/c7950f0c4cf5b5e024c033319279af2550f13334).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15218
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66465/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15218
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15218
  
**[Test build #66465 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66465/consoleFull)**
 for PR 15218 at commit 
[`ed8dd69`](https://github.com/apache/spark/commit/ed8dd69005446532ec77d0d4c5ea60c9e59f0dc5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15371
  
@seyfe The issue is `TaskInfo.accumulables` is accessed in multiple threads 
without synchronization. `TaskMetrics.updatedBlockStatuses` is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15371
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66473/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15371
  
**[Test build #66473 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66473/consoleFull)**
 for PR 15371 at commit 
[`5380aff`](https://github.com/apache/spark/commit/5380aff7d871b03577e7cefb40c616ab747879d5).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15371
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15371: [SPARK-17816] [Core] Fix ConcurrentModificationEx...

2016-10-06 Thread seyfe

Github user seyfe commented on a diff in the pull request:

https://github.com/apache/spark/pull/15371#discussion_r82309886
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -281,7 +281,7 @@ private[spark] object JsonProtocol {
 ("Finish Time" -> taskInfo.finishTime) ~
 ("Failed" -> taskInfo.failed) ~
 ("Killed" -> taskInfo.killed) ~
-("Accumulables" -> 
JArray(taskInfo.accumulables.map(accumulableInfoToJson).toList))
+("Accumulables" -> 
JArray(taskInfo.accumulables.toList.map(accumulableInfoToJson)))
--- End diff --

This wasn't the root cause but it's something nice to have. If you prefer, 
I can revert this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66472/consoleFull)**
 for PR 15375 at commit 
[`4aab6cf`](https://github.com/apache/spark/commit/4aab6cf4d6e2f05c1e893cbc6d05fcc1763ea0f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17816] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15371
  
**[Test build #66473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66473/consoleFull)**
 for PR 15371 at commit 
[`5380aff`](https://github.com/apache/spark/commit/5380aff7d871b03577e7cefb40c616ab747879d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15292
  
Will continue the review tonight. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15365
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15365
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66468/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15365
  
**[Test build #66468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66468/consoleFull)**
 for PR 15365 at commit 
[`0811fc3`](https://github.com/apache/spark/commit/0811fc3c71b665b99aec1e794d3782c98563e84c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15384
  
**[Test build #66471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66471/consoleFull)**
 for PR 15384 at commit 
[`c7950f0`](https://github.com/apache/spark/commit/c7950f0c4cf5b5e024c033319279af2550f13334).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15384
  
> Are you sure we are ignoring all internal topics with that filter?
> Because, it maybe better to not make assumption, and rather prefix all 
topics we create with something, and deleted only those.

Good point. Just use `stress` to filter instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15385: [DO NOT MERGE]Try to reproduce DirectKafkaStreamSuite fa...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15385
  
**[Test build #66470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66470/consoleFull)**
 for PR 15385 at commit 
[`0fc2da9`](https://github.com/apache/spark/commit/0fc2da9e7d35f645d8564d85389ff74f264d3d00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15385: [DO NOT MERGE]Try to reproduce DirectKafkaStreamS...

2016-10-06 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/15385

[DO NOT MERGE]Try to reproduce DirectKafkaStreamSuite failure

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark repo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15385


commit 0fc2da9e7d35f645d8564d85389ff74f264d3d00
Author: Shixiong Zhu 
Date:   2016-10-06T23:30:55Z

[DO NOT MERGE]Try to reproduce DirectKafkaStreamSuite failure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82306624
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred&password=secret
 
   
+
+  
+user
+
+  The user to connect as.
+
+  
+
+  
+password
+
+  The password to use when connecting.
--- End diff --

`the password for logging into the database `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15384
  
Are you sure we are ignoring all internal topics with that filter?
Because, it maybe better to not make assumption, and rather prefix all 
topics we create with something, and deleted only those. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] Determine Serde by hive.default.filef...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15190
  
retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15366
  
**[Test build #66469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66469/consoleFull)**
 for PR 15366 at commit 
[`c1d2b2b`](https://github.com/apache/spark/commit/c1d2b2bd1e1a12791a180f1b753ca082c97df31c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15351
  
Could you review this backport, @gatorsmile ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15383: [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW ...

2016-10-06 Thread dongjoon-hyun

Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/15383


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15383: [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with IN...

2016-10-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15383
  
Thank you so much, @gatorsmile !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15383: [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with IN...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15383
  
Merged to 2.0. 

@dongjoon-hyun Could you please close it? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15383: [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with IN...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15383
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15355
  
@koeninger you can download the unit test logs from 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/1756/artifact/

I saw the offsets of the first batch (pat1 0 16 16) was wrong in the unit 
test logs:
```
16/10/05 20:42:46.313 streaming-start INFO JobGenerator: Started 
JobGenerator at 1475725367000 ms
16/10/05 20:42:46.313 streaming-start INFO JobScheduler: Started 
JobScheduler
16/10/05 20:42:46.313 
pool-1-thread-1-ScalaTest-running-DirectKafkaStreamSuite INFO StreamingContext: 
StreamingContext started
16/10/05 20:42:47.015 JobGenerator INFO JobScheduler: Added jobs for time 
1475725367000 ms
16/10/05 20:42:47.016 JobScheduler INFO JobScheduler: Starting job 
streaming job 1475725367000 ms.0 from job set of time 1475725367000 ms
16/10/05 20:42:47.017 streaming-job-executor-0 INFO DirectKafkaStreamSuite: 
pat2 0 3 16
16/10/05 20:42:47.017 streaming-job-executor-0 INFO DirectKafkaStreamSuite: 
pat1 0 16 16
```
Seems the `earliest` config does not work.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15351
  
@hvanhovell If you are busy, I can take a look at this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread kayousterhout

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/15249
  
@mridulm re: job failures, can you elaborate on the job failure scenario 
you're concerned about?

Jobs can only fail when some tasks are unschedulable, which can happen if a 
task is permanently blacklisted on all available nodes.  This can only happen 
when the number of nodes is smaller than the maximum number of failures for a 
particular tax attempt, and also seems like it's very similar to existing 
behavior: currently, if a task is blacklisted (even though the blacklist is 
temporary) on all nodes, the job will be failed 
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L595).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15355
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66463/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15369: [SPARK-17795] [Web UI] Sorting on stage or job ta...

2016-10-06 Thread ajbozarth

Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/15369#discussion_r82301075
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala ---
@@ -241,7 +246,8 @@ private[ui] class StagePagedTable(
 val headerLink = Unparsed(
   parameterPath +
 s"&$stageTag.sort=${URLEncoder.encode(header, "UTF-8")}" +
-s"&$stageTag.pageSize=$pageSize")
+s"&$stageTag.pageSize=$pageSize") +
+s"#$tableHeaderId"
--- End diff --

yeah those was already there for the quick links at the top of those pages


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15355
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15355
  
**[Test build #66463 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66463/consoleFull)**
 for PR 15355 at commit 
[`5ba78b6`](https://github.com/apache/spark/commit/5ba78b66c39c27c5e518f2cf394b64c8df88603e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15365
  
**[Test build #66468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66468/consoleFull)**
 for PR 15365 at commit 
[`0811fc3`](https://github.com/apache/spark/commit/0811fc3c71b665b99aec1e794d3782c98563e84c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15300: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2016-10-06 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15300
  
@hvanhovell , @cloud-fan : Can you please review this PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15384
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66466/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15384
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15384
  
**[Test build #66466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66466/consoleFull)**
 for PR 15384 at commit 
[`5158d5d`](https://github.com/apache/spark/commit/5158d5dcc925c8a384816cecbf1611233281ea9e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66462/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15249
  
**[Test build #66462 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66462/consoleFull)**
 for PR 15249 at commit 
[`34eff27`](https://github.com/apache/spark/commit/34eff27bf25d80d4b6d8a31e7cbbadd2794d2e9c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15249
  
@squito I am hoping we _can_ remove the old code/functionality actually (it 
is klunky very specific to single executor resource contention/shutdown usecase 
- unfortunately common enough to warrant its introduction), and subsume it with 
a better design/impl - perhaps as part of your work (in this and other pr's).

@kayousterhout I believe my concern with (2) is that the blacklist is 
(currently) permanent for task/taskset on an executor/node. For jobs running on 
larger number of executors, this will perhaps not be too much of an issue 
(other than a degradation in performance); but as the executor/node count 
decreases, we increase probability of job failures even if the transient 
failures are recoverable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15371: [SPARK-17463] [Core] Fix ConcurrentModificationException...

2016-10-06 Thread seyfe

Github user seyfe commented on the issue:

https://github.com/apache/spark/pull/15371
  
Hi @zsxwing. I have a fix ready and testing it now. I will create a new 
ticket and send an updated PR today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-le...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15360#discussion_r82297698
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -405,6 +405,78 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
 }
   }
 
+  test("check column statistics for case sensitive columns") {
+val tableName = "tbl"
+// scalastyle:off
+// non ascii characters are not allowed in the source code, so we 
disable the scalastyle.
+val columnGroups: Seq[(String, String)] = Seq(("c1", "C1"), ("åc", 
"åC"))
+// scalastyle:on
+columnGroups.foreach { case (column1, column2) =>
+  withTable(tableName) {
+withSQLConf("spark.sql.caseSensitive" -> "true") {
+  sql(s"CREATE TABLE $tableName (`$column1` int, `$column2` 
double) USING PARQUET")
+  sql(s"INSERT INTO $tableName SELECT 1, 3.0")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS 
`$column1`, `$column2`")
+  val readback = spark.table(tableName)
+  val relations = readback.queryExecution.analyzed.collect { case 
rel: LogicalRelation =>
+val columnStats = rel.catalogTable.get.stats.get.colStats
+assert(columnStats.size == 2)
+StatisticsTest.checkColStat(
+  dataType = IntegerType,
+  colStat = columnStats(column1),
+  expectedColStat = ColumnStat(InternalRow(0L, 1, 1, 1L)),
+  rsd = spark.sessionState.conf.ndvMaxError)
+StatisticsTest.checkColStat(
+  dataType = DoubleType,
+  colStat = columnStats(column2),
+  expectedColStat = ColumnStat(InternalRow(0L, 3.0d, 3.0d, 
1L)),
+  rsd = spark.sessionState.conf.ndvMaxError)
+rel
+  }
+  assert(relations.size == 1)
+}
+  }
+}
+  }
+
+  test("test refreshing statistics of cached data source table") {
+val tableName = "tbl"
+withTable(tableName) {
+  val tableIndent = TableIdentifier(tableName, Some("default"))
+  val catalog = 
spark.sessionState.catalog.asInstanceOf[HiveSessionCatalog]
+  sql(s"CREATE TABLE $tableName (key int) USING PARQUET")
+  sql(s"INSERT INTO $tableName SELECT 1")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS key")
+  // Table lookup will make the table cached.
+  catalog.lookupRelation(tableIndent)
+
+  val cachedTable1 = catalog.getCachedDataSourceTable(tableIndent)
+  assert(cachedTable1.statistics.sizeInBytes > 0)
+  assert(cachedTable1.statistics.rowCount.contains(1))
+  StatisticsTest.checkColStat(
+dataType = IntegerType,
+colStat = cachedTable1.statistics.colStats("key"),
+expectedColStat = ColumnStat(InternalRow(0L, 1, 1, 1L)),
+rsd = spark.sessionState.conf.ndvMaxError)
+
+  sql(s"INSERT INTO $tableName SELECT 2")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS key")
--- End diff --

The above both DDL will call `refreshTable` with the same table name. 
Right? If the source codes remove any `refreshTable`, the test case still 
passes. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-le...

2016-10-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15360#discussion_r82297526
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -405,6 +405,78 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
 }
   }
 
+  test("check column statistics for case sensitive columns") {
+val tableName = "tbl"
+// scalastyle:off
+// non ascii characters are not allowed in the source code, so we 
disable the scalastyle.
+val columnGroups: Seq[(String, String)] = Seq(("c1", "C1"), ("åc", 
"åC"))
+// scalastyle:on
+columnGroups.foreach { case (column1, column2) =>
+  withTable(tableName) {
+withSQLConf("spark.sql.caseSensitive" -> "true") {
+  sql(s"CREATE TABLE $tableName (`$column1` int, `$column2` 
double) USING PARQUET")
+  sql(s"INSERT INTO $tableName SELECT 1, 3.0")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS 
`$column1`, `$column2`")
+  val readback = spark.table(tableName)
+  val relations = readback.queryExecution.analyzed.collect { case 
rel: LogicalRelation =>
+val columnStats = rel.catalogTable.get.stats.get.colStats
+assert(columnStats.size == 2)
+StatisticsTest.checkColStat(
+  dataType = IntegerType,
+  colStat = columnStats(column1),
+  expectedColStat = ColumnStat(InternalRow(0L, 1, 1, 1L)),
+  rsd = spark.sessionState.conf.ndvMaxError)
+StatisticsTest.checkColStat(
+  dataType = DoubleType,
+  colStat = columnStats(column2),
+  expectedColStat = ColumnStat(InternalRow(0L, 3.0d, 3.0d, 
1L)),
+  rsd = spark.sessionState.conf.ndvMaxError)
+rel
+  }
+  assert(relations.size == 1)
+}
+  }
+}
+  }
+
+  test("test refreshing statistics of cached data source table") {
+val tableName = "tbl"
+withTable(tableName) {
+  val tableIndent = TableIdentifier(tableName, Some("default"))
+  val catalog = 
spark.sessionState.catalog.asInstanceOf[HiveSessionCatalog]
+  sql(s"CREATE TABLE $tableName (key int) USING PARQUET")
+  sql(s"INSERT INTO $tableName SELECT 1")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS key")
+  // Table lookup will make the table cached.
+  catalog.lookupRelation(tableIndent)
+
+  val cachedTable1 = catalog.getCachedDataSourceTable(tableIndent)
+  assert(cachedTable1.statistics.sizeInBytes > 0)
+  assert(cachedTable1.statistics.rowCount.contains(1))
+  StatisticsTest.checkColStat(
+dataType = IntegerType,
+colStat = cachedTable1.statistics.colStats("key"),
+expectedColStat = ColumnStat(InternalRow(0L, 1, 1, 1L)),
+rsd = spark.sessionState.conf.ndvMaxError)
+
+  sql(s"INSERT INTO $tableName SELECT 2")
+  sql(s"ANALYZE TABLE $tableName COMPUTE STATISTICS")
--- End diff --

What is the purpose of this DDL?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15370: [SPARK-17417][Core] Fix # of partitions for Relia...

2016-10-06 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15370#discussion_r82296991
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -71,6 +72,11 @@ private[spark] object CallSite {
 private[spark] object Utils extends Logging {
   val random = new Random()
 
+  // Use a common file numbering format which defaults to 5 digits for 
saving various part files.
+  val numFormatter = NumberFormat.getIntegerInstance()
--- End diff --

This is a global utility class; it shouldn't have an object so specific to 
one usage


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15218: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-06 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15218#discussion_r82296969
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.SparkConf
+
+case class OfferState(workOffer: WorkerOffer, var cores: Int) {
+  // Build a list of tasks to assign to each worker.
+  val tasks = new ArrayBuffer[TaskDescription](cores)
+}
+
+abstract class TaskAssigner(conf: SparkConf) {
+  var offer: Seq[OfferState] = _
+  val CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1)
+
+  // The final assigned offer returned to TaskScheduler.
+  def tasks(): Seq[ArrayBuffer[TaskDescription]] = offer.map(_.tasks)
+
+  // construct the assigner by the workoffer.
+  def construct(workOffer: Seq[WorkerOffer]): Unit = {
+offer = workOffer.map(o => OfferState(o, o.cores))
+  }
+
+  // Invoked in each round of Taskset assignment to initialize the 
internal structure.
+  def init(): Unit
+
+  // Indicating whether there is offer available to be used by one round 
of Taskset assignment.
+  def hasNext(): Boolean
+
+  // Next available offer returned to one round of Taskset assignment.
+  def getNext(): OfferState
+
+  // Called by the TaskScheduler to indicate whether the current offer is 
accepted
+  // In order to decide whether the current is valid for the next offering.
+  def taskAssigned(assigned: Boolean): Unit
+
+  // Release internally maintained resources. Subclass is responsible to
+  // release its own private resources.
+  def reset: Unit = {
+offer = null
+  }
+}
+
+class RoundRobinAssigner(conf: SparkConf) extends TaskAssigner(conf) {
+  var i = 0
+  override def construct(workOffer: Seq[WorkerOffer]): Unit = {
+offer = Random.shuffle(workOffer.map(o => OfferState(o, o.cores)))
+  }
+  override def init(): Unit = {
+i = 0
+  }
+  override def hasNext: Boolean = {
+i < offer.size
+  }
+  override def getNext(): OfferState = {
+offer(i)
+  }
+  override def taskAssigned(assigned: Boolean): Unit = {
+i += 1
+  }
+  override def reset: Unit = {
+super.reset
+i = 0
+  }
+}
+
+class BalancedAssigner(conf: SparkConf) extends TaskAssigner(conf) {
--- End diff --

Returning 0 implies equality - which is not the case here (x != y but 
x.cores == y.cores).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15378: [SPARK-17803][TESTS] Upgrade docker-client dependency

2016-10-06 Thread ckadner

Github user ckadner commented on the issue:

https://github.com/apache/spark/pull/15378
  
Thanks @JoshRosen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15375: [SPARK-17790] Support for parallelizing R data.fr...

2016-10-06 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/15375#discussion_r82296182
  
--- Diff: R/pkg/R/context.R ---
@@ -123,19 +126,46 @@ parallelize <- function(sc, coll, numSlices = 1) {
   if (numSlices > length(coll))
 numSlices <- length(coll)
 
+  sizeLimit <- as.numeric(
+sparkR.conf("spark.r.maxAllocationLimit", 
toString(.Machine$integer.max - 10240)))
+  objectSize <- object.size(coll)
--- End diff --

Since the guess of size could easily be wrong, and writing them into disk 
is not that bad anyway, should we have a much smaller default value (for 
example, 100M)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15365
  
re-test please 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66467 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66467/consoleFull)**
 for PR 15375 at commit 
[`8e065c1`](https://github.com/apache/spark/commit/8e065c100389bd5e89f02ffb43319bb2089a44c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15384
  
**[Test build #66466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66466/consoleFull)**
 for PR 15384 at commit 
[`5158d5d`](https://github.com/apache/spark/commit/5158d5dcc925c8a384816cecbf1611233281ea9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15384: [SPARK-17346][SQL]Fix the flaky topic deletion in...

2016-10-06 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/15384

[SPARK-17346][SQL]Fix the flaky topic deletion in tests

## What changes were proposed in this pull request?

A follow up Pr for SPARK-17346 to fix flaky 
`org.apache.spark.sql.kafka010.KafkaSourceStressSuite`.

Test log: 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1855/testReport/junit/org.apache.spark.sql.kafka010/KafkaSourceStressSuite/_It_is_not_a_test_/

Looks like deleting the Kafka internal topic `__consumer_offsets` is flaky. 
This PR just simply ignores internal topics.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-17346-flaky-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15384


commit 5158d5dcc925c8a384816cecbf1611233281ea9e
Author: Shixiong Zhu 
Date:   2016-10-06T22:00:04Z

Fix the flaky topic deletion in tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-06 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15384
  
/cc @tdas 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 >

101 - 200 of 485 matches

Mail list logo