date:20161203

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16098
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69619/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16098
  
**[Test build #69619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69619/consoleFull)**
 for PR 16098 at commit 
[`4804862`](https://github.com/apache/spark/commit/48048622067f092ed247bc555e5461c073894a9c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #3466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3466/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90756702
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,27 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
--- End diff --

BTW is this supposed to be called on every batch or once at the end? I 
don't know how it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90756693
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,27 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
+  }
+
+  /**
+   * Limit the number of processed records from Kinesis stream. This is 
because the KCL cannot
+   * control the number of aggregated records to be fetched even if we set 
`MaxRecords`
+   * in `KinesisClientLibConfiguration`. For example, if we set 10 to the 
number of max records
+   * in a worker and a producer aggregates two records into one message, 
the worker possibly
+   * 20 records every callback function called.
+   */
+  private def processRecordsWithLimit(
+  batch: List[Record], checkpointer: IRecordProcessorCheckpointer): 
Unit = {
+val maxRecords = receiver.getCurrentLimit
+for (start <- 0 until batch.size by maxRecords) {
--- End diff --

Hm, it just occurred to me that you would have a problem here if batch.size 
and maxRecords were both over Int.MaxValue / 2, and maxRecords were a bit 
smaller than batch.size. The addition below overflows.

It seems like a corner case but I note above you already defensively capped 
the maxRecords at Int.MaxValue so maybe it's less unlikely than it sounds.

You can fix it by letting the addition and min comparison take place over 
longs and then convert back to int.

Alternatively I think this is even simpler in Scala, though I imagine 
there's some extra overhead here:

```
batch.grouped(maxRecords).foreach(batch => addRecords(batch, checkpointer))
```

I don't know of a good reviewer for this component but I think I'm 
comfortable merging a straightforward change like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-12-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15998#discussion_r90756432
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -519,6 +519,26 @@ private[hive] class HiveClientImpl(
 client.alterPartitions(table, newParts.map { p => toHivePartition(p, 
hiveTable) }.asJava)
   }
 
+  /**
+   * Returns the partition names for the given table that match the 
supplied partition spec.
+   * If no partition spec is specified, all partitions are returned.
+   *
+   * The returned sequence is sorted as strings.
--- End diff --

but according to how we use this API in Spark SQL, we don't need to sort 
right? maybe we can just do the sorting in test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16068#discussion_r90756326
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala 
---
@@ -144,7 +144,7 @@ private[hive] case class HiveGenericUDF(
   @transient
   private lazy val isUDFDeterministic = {
 val udfType = function.getClass.getAnnotation(classOf[HiveUDFType])
-udfType != null && udfType.deterministic()
+udfType != null && udfType.deterministic() && !udfType.stateful()
--- End diff --

an unrelated question, what's the difference between 
`udfType.deterministic` and `udfType.stateful`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16122
  
**[Test build #69622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69622/consoleFull)**
 for PR 16122 at commit 
[`f8955df`](https://github.com/apache/spark/commit/f8955dfc966ae41fbe2086168d62d44d61e15576).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15995
  
**[Test build #69623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69623/consoleFull)**
 for PR 15995 at commit 
[`b5f4394`](https://github.com/apache/spark/commit/b5f43946fd72932f7e23ac1f1b3866b150fe745b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16129
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16129
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69618/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #69618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69618/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-12-03 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/11105
  
I'm down the idea of having add and merge not be final with huge warning 
signs and we could switch it up in 3.X to be final.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...

2016-12-03 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r90755993
  
--- Diff: 
core/src/test/scala/org/apache/spark/DataPropertyAccumulatorSuite.scala ---
@@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import scala.concurrent.ExecutionContext.Implicits.global
+import scala.ref.WeakReference
+
+import org.scalatest.Matchers
+
+import org.apache.spark.scheduler._
+import org.apache.spark.util.{AccumulatorContext, AccumulatorMetadata, 
AccumulatorV2, LongAccumulator}
+
+
+class DataPropertyAccumulatorSuite extends SparkFunSuite with Matchers 
with LocalSparkContext {
--- End diff --

That sounds like a good plan, I'll try and give the tests some more 
descriptive names (or where that isn't enough explain in comments some more 
about the functionality they are testing).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16114
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69620/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16114
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16114
  
**[Test build #69620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69620/consoleFull)**
 for PR 16114 at commit 
[`f381ac2`](https://github.com/apache/spark/commit/f381ac26cfd14420dbe21b1d58be54c201542357).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-12-03 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/12004
  
Test failure due to new artifacts
```
+++ b/dev/pr-deps/spark-deps-hadoop-2.7
@@ -16,8 +16,6 @@ arpack_combined_all-0.1.jar
 avro-1.7.7.jar
 avro-ipc-1.7.7.jar
 avro-mapred-1.7.7-hadoop2.jar
-aws-java-sdk-1.7.4.jar
-azure-storage-2.0.0.jar
 base64-2.3.8.jar
 bcprov-jdk15on-1.51.jar
 bonecp-0.8.0.RELEASE.jar
@@ -63,8 +61,6 @@ guice-3.0.jar
 guice-servlet-3.0.jar
 hadoop-annotations-2.7.3.jar
 hadoop-auth-2.7.3.jar
-hadoop-aws-2.7.3.jar
-hadoop-azure-2.7.3.jar
 hadoop-client-2.7.3.jar
 hadoop-common-2.7.3.jar
 hadoop-hdfs-2.7.3.jar
@@ -73,7 +69,6 @@ hadoop-mapreduce-client-common-2.7.3.jar
 hadoop-mapreduce-client-core-2.7.3.jar
 hadoop-mapreduce-client-jobclient-2.7.3.jar
 hadoop-mapreduce-client-shuffle-2.7.3.jar
-hadoop-hadoop-openstack-2.7.3.jar
 hadoop-yarn-api-2.7.3.jar
 hadoop-yarn-client-2.7.3.jar
 hadoop-yarn-common-2.7.3.jar
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69621/consoleFull)**
 for PR 16030 at commit 
[`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
The failure seems to be not related to this pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16114
  
**[Test build #69620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69620/consoleFull)**
 for PR 16114 at commit 
[`f381ac2`](https://github.com/apache/spark/commit/f381ac26cfd14420dbe21b1d58be54c201542357).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16114
  
@srowen Do u know qualified maintainers on this component?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69617/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69617/consoleFull)**
 for PR 16030 at commit 
[`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69616/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #69616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69616/consoleFull)**
 for PR 13909 at commit 
[`b29d7cf`](https://github.com/apache/spark/commit/b29d7cf11a6b13f979ad96e1f1879409daf3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90754922
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,31 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
+  }
+
+  /**
+   * Limit the number of processed records from Kinesis stream. This is 
because the KCL cannot
+   * control the number of aggregated records to be fetched even if we set 
`MaxRecords`
+   * in `KinesisClientLibConfiguration`. For example, if we set 10 to the 
number of max records
+   * in a worker and a producer aggregates two records into one message, 
the worker possibly
+   * 20 records every callback function called.
+   */
+  private def processRecordsWithLimit(
+  batch: List[Record], checkpointer: IRecordProcessorCheckpointer): 
Unit = {
+val maxRecords = receiver.getCurrentLimit
+if (batch.size() <= maxRecords) {
+  addRecords(batch, checkpointer)
--- End diff --

Aha, I see. I'll fix, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16069: [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven...

2016-12-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16069


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16069: [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven plugin...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16069
  
Merged to master. It's a build change and probably fine for 2.1 but it's 
non-trivial.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16031: [SPARK-18606][HISTORYSERVER]remove useless elemen...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16031#discussion_r90754812
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/historypage.js 
---
@@ -78,6 +78,12 @@ jQuery.extend( jQuery.fn.dataTableExt.oSort, {
 }
 } );
 
+jQuery.extend( jQuery.fn.dataTableExt.ofnSearch, {
+"appid-numeric": function ( a ) {
+return a.replace(/[\r\n]/g, " ").replace(/<.*?>/g, "");
--- End diff --

@WangTaoTheTonic does that make sense / do you have time to look into this 
alternative?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16103: [SPARK-18374][ML]Incorrect words in StopWords/eng...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16103#discussion_r90754782
  
--- Diff: 
mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/english.txt ---
@@ -149,5 +149,58 @@ shan
 shouldn
 wasn
 weren
-won
 wouldn
--- End diff --

You would then remove the other stems like "wasn" "weren" etc right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90754731
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,31 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
+  }
+
+  /**
+   * Limit the number of processed records from Kinesis stream. This is 
because the KCL cannot
+   * control the number of aggregated records to be fetched even if we set 
`MaxRecords`
+   * in `KinesisClientLibConfiguration`. For example, if we set 10 to the 
number of max records
+   * in a worker and a producer aggregates two records into one message, 
the worker possibly
+   * 20 records every callback function called.
+   */
+  private def processRecordsWithLimit(
+  batch: List[Record], checkpointer: IRecordProcessorCheckpointer): 
Unit = {
+val maxRecords = receiver.getCurrentLimit
+if (batch.size() <= maxRecords) {
+  addRecords(batch, checkpointer)
--- End diff --

I think the for loop even takes care of this case, but no big deal either 
way. It seems like a reasonable change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69615/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16120
  
**[Test build #69615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69615/consoleFull)**
 for PR 16120 at commit 
[`a5594f7`](https://github.com/apache/spark/commit/a5594f7ffcbdc9ab2e83008a99d5878fa9fae2b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16098
  
**[Test build #69619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69619/consoleFull)**
 for PR 16098 at commit 
[`4804862`](https://github.com/apache/spark/commit/48048622067f092ed247bc555e5461c073894a9c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16037
  
Yes I'm pretty OK with merging this. If you can dig up any results, that's 
all the better. Will check in with you next week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16116: [SPARK-18685][TESTS] Fix URI and release resources after...

2016-12-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16116
  
Thank you !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16098
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16116: [SPARK-18685][TESTS] Fix URI and release resource...

2016-12-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16116


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16116: [SPARK-18685][TESTS] Fix URI and release resources after...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16116
  
Merged to master/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16102: [SPARK-18586][BUILD] netty-3.8.0.Final.jar has vu...

2016-12-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16102


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16102: [SPARK-18586][BUILD] netty-3.8.0.Final.jar has vulnerabi...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16102
  
Merged to master, though as I say I don't think the CVE actually impacted 
Spark to begin with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #69618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69618/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16129: [SPARK-18678][ML] Skewed feature subsampling in R...

2016-12-03 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/16129

[SPARK-18678][ML] Skewed feature subsampling in Random forest

## What changes were proposed in this pull request?

Fix reservoir sampling bias for small k. An off-by-one error meant that the 
probability of replacement was slightly too high -- k/(l-1) after l element 
instead of k/l, which matters for small k.

## How was this patch tested?

Existing test plus new test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-18678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16129


commit 8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec
Author: Sean Owen 
Date:   2016-12-03T09:32:00Z

Fix reservoir sampling bias for small k




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69617/consoleFull)**
 for PR 16030 at commit 
[`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #69616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69616/consoleFull)**
 for PR 13909 at commit 
[`b29d7cf`](https://github.com/apache/spark/commit/b29d7cf11a6b13f979ad96e1f1879409daf3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-03 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r90752975
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, CreateArray, 
CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField( createNamedStructLike : CreateNamedStructLike, 
ordinal, _ ) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField( elem : Expression ) = {
+  GetStructField( elem, ordinal, Some(field.name) )
+}
+CreateArray( elems.map(getStructField) )
+  // push down item selection.
+  case ga @ GetArrayItem( CreateArray(elems), IntegerLiteral( idx ) ) 
=>
+if ( idx >= 0 && idx < elems.size ) {
+  elems(idx)
+} else {
+  Cast( Literal( null), ga.dataType )
+}
+}
+  }
+}
+
+/**
+* push down operations into [[CreateMap]].
+*/
+object SimplifyCreateMapOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
--- End diff --

@gatorsmile I've run a small regex on the spark source tree:
`git grep -En '[a-zA-Z][{]' -- *.scala`

this returns 277 places where this space is missing, am I missing anything?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16120
  
**[Test build #69615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69615/consoleFull)**
 for PR 16120 at commit 
[`a5594f7`](https://github.com/apache/spark/commit/a5594f7ffcbdc9ab2e83008a99d5878fa9fae2b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16120
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16098
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16098
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69614/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69611/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69610/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2

101 - 160 of 160 matches

Mail list logo