date:20180522

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-22 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21385#discussion_r190057347
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
 ---
@@ -56,20 +62,46 @@ private[shuffle] class UnsafeRowReceiver(
 
   override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] = {
 case r: UnsafeRowReceiverMessage =>
-  queue.put(r)
+  queues(r.writerId).put(r)
   context.reply(())
   }
 
   override def read(): Iterator[UnsafeRow] = {
 new NextIterator[UnsafeRow] {
-  override def getNext(): UnsafeRow = queue.take() match {
-case ReceiverRow(r) => r
-case ReceiverEpochMarker() =>
-  finished = true
-  null
+  private val numWriterEpochMarkers = new AtomicInteger(0)
--- End diff --

does this need a atomicinteger? seems like this is only incremented and 
accessed from the main thread draining the iterator below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-22 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21385#discussion_r190057920
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
 ---
@@ -56,20 +62,46 @@ private[shuffle] class UnsafeRowReceiver(
 
   override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] = {
 case r: UnsafeRowReceiverMessage =>
-  queue.put(r)
+  queues(r.writerId).put(r)
   context.reply(())
   }
 
   override def read(): Iterator[UnsafeRow] = {
 new NextIterator[UnsafeRow] {
-  override def getNext(): UnsafeRow = queue.take() match {
-case ReceiverRow(r) => r
-case ReceiverEpochMarker() =>
-  finished = true
-  null
+  private val numWriterEpochMarkers = new AtomicInteger(0)
+
+  private val executor = 
Executors.newFixedThreadPool(numShuffleWriters)
+  private val completion = new 
ExecutorCompletionService[UnsafeRowReceiverMessage](executor)
+
+  private def completionTask(writerId: Int) = new 
Callable[UnsafeRowReceiverMessage] {
+override def call(): UnsafeRowReceiverMessage = 
queues(writerId).take()
   }
 
-  override def close(): Unit = {}
+  (0 until numShuffleWriters).foreach(writerId => 
completion.submit(completionTask(writerId)))
+
+  override def getNext(): UnsafeRow = {
+completion.take().get() match {
+  case ReceiverRow(writerId, r) =>
+// Start reading the next element in the queue we just took 
from.
+completion.submit(completionTask(writerId))
+r
+// TODO use writerId
+  case ReceiverEpochMarker(writerId) =>
+// Don't read any more from this queue. If all the writers 
have sent epoch markers,
+// the epoch is over; otherwise we need rows from one of the 
remaining writers.
+val writersCompleted = numWriterEpochMarkers.incrementAndGet()
+if (writersCompleted == numShuffleWriters) {
+  finished = true
+  null
+} else {
+  getNext()
--- End diff --

with a large number of shuffle writers, there is a very slight chance that 
this is can lead to a stackoverflow; when you accidentally get a sequence of N 
markers from N writers and N is large. make this a while loop. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21394
  
**[Test build #90997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90997/testReport)**
 for PR 21394 at commit 
[`70839e8`](https://github.com/apache/spark/commit/70839e8bf56338698d750a9d7830dca29ae13dce).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-22 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21394
  
jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21082
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3477/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21082
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix

2018-05-22 Thread wajda

Github user wajda commented on the issue:

https://github.com/apache/spark/pull/21401
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21402: SPARK-24335 Spark external shuffle server improvement to...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21402
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21402: SPARK-24335 Spark external shuffle server improve...

2018-05-22 Thread Victsm

GitHub user Victsm opened a pull request:

https://github.com/apache/spark/pull/21402

SPARK-24335 Spark external shuffle server improvement to better handle 
block fetch requests.



## What changes were proposed in this pull request?

Right now, the default server side netty handler threads is 2 * # cores, 
and can be further configured with parameter spark.shuffle.io.serverThreads.
In order to process a client request, it would require one available server 
netty handler thread.
However, when the server netty handler threads start to process 
ChunkFetchRequests, they will be blocked on disk I/O, mostly due to disk 
contentions from the random read operations initiated by all the 
ChunkFetchRequests received from clients.
As a result, when the shuffle server is serving many concurrent 
ChunkFetchRequests, the server side netty handler threads could all be blocked 
on reading shuffle files, thus leaving no handler thread available to process 
other types of requests which should all be very quick to process.

This issue could potentially be fixed by limiting the number of netty 
handler threads that could get blocked when processing ChunkFetchRequest. We 
have a patch to do this by using a separate EventLoopGroup with a dedicated 
ChannelHandler to process ChunkFetchRequest. This enables shuffle server to 
reserve netty handler threads for non-ChunkFetchRequest, thus enabling 
consistent processing time for these requests which are fast to process. After 
deploying the patch in our infrastructure, we no longer see timeout issues with 
either executor registration with local shuffle server or shuffle client 
establishing connection with remote shuffle server.

## How was this patch tested?

Added unit test for this patch.
In addition, we built an internal tool to stress test Spark shuffle service 
and have observed significant improvement after applying this patch.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Victsm/spark SPARK-24355

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21402.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21402


commit d862a2dec30f1fefe03d26e0b8a07d10eac7dbf5
Author: Min Shen 
Date:   2017-09-11T03:59:25Z

SPARK-24335 Spark external shuffle server improvement to better handle 
block fetch requests.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21402: SPARK-24335 Spark external shuffle server improvement to...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21402
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-05-22 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21082
  
@ueshin I responded to your comments. Please let me know what you think, 
thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-05-22 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21082#discussion_r190065854
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -268,3 +269,38 @@ object PhysicalAggregation {
 case _ => None
   }
 }
+
+/**
+ * An extractor used when planning physical execution of a window. This 
extractor outputs
+ * the window function type of the logical window.
+ *
+ * The input logical window must contain same type of window functions, 
which is ensured by
+ * the rule ExtractWindowExpressions in the analyzer.
+ */
+object PhysicalWindow {
+  // windowFunctionType, windowExpression, partitionSpec, orderSpec, child
+  type ReturnType =
+(WindowFunctionType, Seq[NamedExpression], Seq[Expression], 
Seq[SortOrder], LogicalPlan)
+
+  def unapply(a: Any): Option[ReturnType] = a match {
+case expr @ logical.Window(windowExpressions, partitionSpec, 
orderSpec, child) =>
+
+  if (windowExpressions.isEmpty) {
+throw new AnalysisException(s"Window expression is empty in $expr")
--- End diff --

I added comments to explain this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...

2018-05-22 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/21382
  
@felixcheung `spark.network.timeout` is one of the most weird configs as it 
may have different default values in different places. I think this is why it's 
not added to the global `org.apache.spark.internal.config`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21082
  
**[Test build #90996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90996/testReport)**
 for PR 21082 at commit 
[`ed02823`](https://github.com/apache/spark/commit/ed028237d9892e8d1e32607e77dad24ad0252b05).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21401
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90990/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21401
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21401
  
**[Test build #90990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90990/testReport)**
 for PR 21401 at commit 
[`8e8f623`](https://github.com/apache/spark/commit/8e8f62385b3a81cd21b5fd033a5dee9bc7d8a040).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21331: [SPARK-24276][SQL] Order of literals in IN should...

2018-05-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21331#discussion_r190062760
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -85,6 +87,10 @@ object Canonicalize {
 case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
 case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
 
+// order the list in the In operator
+case In(value, list) =>
+  In(value, list.sortBy(_.semanticHash()))
--- End diff --

A few general suggestions. 
- We need to add test cases when we want to cover the extra cases. 
- Update the comments in the description when we add new code
- Try our best to make the code simple.
- Find the best test suite when we add a new test. If unable to find a good 
one, create a new test suite. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-05-22 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21082#discussion_r190061061
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5181,6 +5190,235 @@ def test_invalid_args(self):
 'mixture.*aggregate function.*group aggregate pandas 
UDF'):
 df.groupby(df.id).agg(mean_udf(df.v), mean(df.v)).collect()
 
+
+@unittest.skipIf(
+not _have_pandas or not _have_pyarrow,
+_pandas_requirement_message or _pyarrow_requirement_message)
+class WindowPandasUDFTests(ReusedSQLTestCase):
+@property
+def data(self):
+from pyspark.sql.functions import array, explode, col, lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i * 1.0) + col('id') for i in 
range(20, 30)])) \
+.withColumn("v", explode(col('vs'))) \
+.drop('vs') \
+.withColumn('w', lit(1.0))
+
+@property
+def python_plus_one(self):
--- End diff --

I agree those should be shared, but let's do it maybe in a separate PR? 
Because the refactor will probably touch many test cases and the PR is quite 
large already..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-05-22 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21082#discussion_r190060687
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -268,3 +269,38 @@ object PhysicalAggregation {
 case _ => None
   }
 }
+
+/**
+ * An extractor used when planning physical execution of a window. This 
extractor outputs
+ * the window function type of the logical window.
+ *
+ * The input logical window must contain same type of window functions, 
which is ensured by
+ * the rule ExtractWindowExpressions in the analyzer.
+ */
+object PhysicalWindow {
+  // windowFunctionType, windowExpression, partitionSpec, orderSpec, child
+  type ReturnType =
+(WindowFunctionType, Seq[NamedExpression], Seq[Expression], 
Seq[SortOrder], LogicalPlan)
+
+  def unapply(a: Any): Option[ReturnType] = a match {
+case expr @ logical.Window(windowExpressions, partitionSpec, 
orderSpec, child) =>
+
+  if (windowExpressions.isEmpty) {
+throw new AnalysisException(s"Window expression is empty in $expr")
--- End diff --

Window expression should not be empty here so this should not be reached. 
This is just for safety. Let me add a comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-05-22 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21082#discussion_r190060086
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1869,6 +1870,8 @@ class Analyzer(
   case window: WindowExpression => window.windowSpec
 }.distinct
 
+val windowFunctionType = WindowFunctionType.functionType(expr)
--- End diff --

This is used for grouping window functions into sql window functions and 
pandas UDFs and create different logical node for them. The value is used here 
https://github.com/apache/spark/pull/21082/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R1886


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
I didn't say ORC-301 resolves the issue of SPARK-23458 and SPARK-23390.
SPARK-23458 and SPARK-23390 reports open file leakages in some unknown 
situations, doesn't it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21394
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21394
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90988/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21394
  
**[Test build #90988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90988/testReport)**
 for PR 21394 at commit 
[`70839e8`](https://github.com/apache/spark/commit/70839e8bf56338698d750a9d7830dca29ae13dce).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskKilled(`
  * `trait HasValidationIndicatorCol extends Params `
  * `trait DivModLike extends BinaryArithmetic `
  * `case class Divide(left: Expression, right: Expression) extends 
DivModLike `
  * `case class Remainder(left: Expression, right: Expression) extends 
DivModLike `
  * `class ReadOnlySQLConf(context: TaskContext) extends SQLConf `
  * `class TaskContextConfigProvider(context: TaskContext) extends 
ConfigProvider `
  * `case class ContinuousShuffleReadPartition(index: Int, queueSize: Int) 
extends Partition `
  * `class ContinuousShuffleReadRDD(`
  * `trait ContinuousShuffleReader `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9503: [SPARK-9818] Re-enable Docker tests for JDBC data source

2018-05-22 Thread nonsleepr

Github user nonsleepr commented on the issue:

https://github.com/apache/spark/pull/9503
  
Adding following for better visibility.

The problem while using 
[`com.whisk:docker-testkit-impl-spotify`](https://github.com/whisklabs/docker-it-scala)
 along with Spark:
```log
com.spotify.docker.client.exceptions.DockerException: 
java.util.concurrent.ExecutionException: javax.ws.rs.ProcessingException: 
java.lang.IncompatibleClassChangeError: Found interface 
org.objectweb.asm.ClassVisitor, but class was expected
  at 
com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2657)
  at 
com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2588)
  at 
com.spotify.docker.client.DefaultDockerClient.info(DefaultDockerClient.java:522)
  ... 40 elided
Caused by: java.util.concurrent.ExecutionException: 
javax.ws.rs.ProcessingException: java.lang.IncompatibleClassChangeError: Found 
interface org.objectweb.asm.ClassVisitor, but class was expected
  at 
jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
  at 
jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
  at 
jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
  at 
com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2586)
  ... 41 more
Caused by: javax.ws.rs.ProcessingException: 
java.lang.IncompatibleClassChangeError: Found interface 
org.objectweb.asm.ClassVisitor, but class was expected
  at 
org.glassfish.jersey.client.ClientRuntime.processFailure(ClientRuntime.java:202)
  at 
org.glassfish.jersey.client.ClientRuntime.access$400(ClientRuntime.java:79)
  at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:182)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
  at 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:340)
  at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:210)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  ... 1 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface 
org.objectweb.asm.ClassVisitor, but class was expected
  at 
jnr.ffi.provider.jffi.AsmLibraryLoader.generateInterfaceImpl(AsmLibraryLoader.java:74)
  at 
jnr.ffi.provider.jffi.AsmLibraryLoader.loadLibrary(AsmLibraryLoader.java:59)
  at 
jnr.ffi.provider.jffi.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:43)
  at jnr.ffi.LibraryLoader.load(LibraryLoader.java:287)
  at jnr.unixsocket.Native.(Native.java:76)
  at jnr.unixsocket.UnixSocketChannel.(UnixSocketChannel.java:68)
  at jnr.unixsocket.UnixSocketChannel.open(UnixSocketChannel.java:49)
  at 
com.spotify.docker.client.ApacheUnixSocket.(ApacheUnixSocket.java:59)
  at 
com.spotify.docker.client.UnixConnectionSocketFactory.createSocket(UnixConnectionSocketFactory.java:67)
  at 
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:118)
  at 
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
  at 
org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
  at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
  at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
  at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
  at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
  at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
  at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
  at 
org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435)
  at 
org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:491)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at 
jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
  at

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
For me, those three lines do not throws exceptions. Do you mean another 
lines?
```
OrcProto.PostScript ps;
OrcProto.FileTail.Builder fileTailBuilder = OrcProto.FileTail.newBuilder();
long modificationTime;
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21372
  
I am just trying to find out why ORC-301 resolves the issues of SPARK-23458 
and SPARK-23390


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21372
  

https://github.com/dongjoon-hyun/orc/blob/cad48d6b11a65264a5b22c73aa2be9029aa72988/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L520-L522

Regarding the file leakage, I did not see any exception issued in these 
three lines from our log? Does that mean ORC eat the exceptions attempt to 
re-open the files?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90984/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21399
  
**[Test build #90984 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90984/testReport)**
 for PR 21399 at commit 
[`7bb0eb3`](https://github.com/apache/spark/commit/7bb0eb3be6619ea9d0c7a023da5b665fecbc799e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-05-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20997


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
For Timestamp issue, I'm trying to find some example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21387: Correct reference to Offset class

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21387: Correct reference to Offset class

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90987/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21387: Correct reference to Offset class

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21387
  
**[Test build #90987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90987/testReport)**
 for PR 21387 at commit 
[`81a5d3d`](https://github.com/apache/spark/commit/81a5d3fb31e4b7f9540c7b1ca432df44d79f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-05-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20997
  
That being the case, merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
For file leakage issues, we have been monitoring the flakiness of 
SPARK-23458 and SPARK-23390 in our Jenkins environment. Until now, I couldn't 
reproduce it locally.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21331
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
@gatorsmile .
Basically, ORC-301 will reduce the change of ORC file leakage in some 
cases. I made that patch and merged it long time ago, but it's released at this 
release. Also, ORC-306 fixes a bug on Java `Timestamp` and it's ORC workaround. 
Please see [here for the detail of Java Timestamp bug and the issue on previous 
ORC workaround](https://issues.apache.org/jira/browse/ORC-306).





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3476/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21372#discussion_r190042175
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
 ---
@@ -136,7 +136,7 @@ public int getInt(int rowId) {
   public long getLong(int rowId) {
 int index = getRowIndex(rowId);
 if (isTimestamp) {
-  return timestampData.time[index] * 1000 + timestampData.nanos[index] 
/ 1000;
+  return timestampData.time[index] * 1000 + timestampData.nanos[index] 
/ 1000 % 1000;
--- End diff --

No, what I mean is, with ORC-306 and this fix, there is no external impact 
outside Spark. More specifically, outside 
`OrcColumnVector`/`OrcColumnarBatchReader`. In other words, ORC 1.4.4 cannot be 
used with Apache Spark without this patch.

Java `Timestamp.getTime` and Timestamp.getNano` has an overlap by 
definition. Previously, ORC didn't stick to the definition.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21331
  
**[Test build #90995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90995/testReport)**
 for PR 21331 at commit 
[`0c484f1`](https://github.com/apache/spark/commit/0c484f16a7bcb0f4b476decdf6820c4d89d4cfb0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21331: [SPARK-24276][SQL] Order of literals in IN should...

2018-05-22 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21331#discussion_r190038976
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -85,6 +87,10 @@ object Canonicalize {
 case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
 case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
 
+// order the list in the In operator
+case In(value, list) =>
+  In(value, list.sortBy(_.semanticHash()))
--- End diff --

The only difference is that the elements in the list are canonicalized 
before the hash.  I can't think of any meaningful example. The only one I can 
think of is something like ` mybool in (5 < 2)` and ` mybool in (not 5 >= 2)`. 
But in the future we may have more rules here making meaningful using the 
`semanticHash` which is logically what we want here IMHO


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-05-22 Thread DylanGuedes

Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/21045#discussion_r190036385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -127,6 +127,148 @@ case class MapKeys(child: Expression)
   override def prettyName: String = "map_keys"
 }
 
+@ExpressionDescription(
+  usage = """_FUNC_(a1, a2, ...) - Returns a merged array containing in 
the N-th position the
+  N-th value of each array given.""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4));
+[[1, 2], [2, 3], [3, 4]]
+  > SELECT _FUNC_(array(1, 2), array(2, 3), array(3, 4));
+[[1, 2, 3], [2, 3, 4]]
+  """,
+  since = "2.4.0")
+case class Zip(children: Seq[Expression]) extends Expression with 
ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.length)(ArrayType)
+
+  override def dataType: DataType = ArrayType(mountSchema)
+
+  override def prettyName: String = "zip"
+
+  override def nullable: Boolean = children.forall(_.nullable)
+
+  lazy val numberOfArrays: Int = children.length
+
+  private lazy val arrayTypes = 
children.map(_.dataType.asInstanceOf[ArrayType])
+
+  private lazy val arrayElementTypes = arrayTypes.map(_.elementType)
+
+  def mountSchema: StructType = {
+val fields = arrayTypes.zipWithIndex.foldRight(List[StructField]()) { 
case ((arr, idx), list) =>
+  StructField(s"_$idx", arr.elementType, children(idx).nullable || 
arr.containsNull) :: list
+}
+StructType(fields)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val genericArrayData = classOf[GenericArrayData].getName
+val genericInternalRow = classOf[GenericInternalRow].getName
+
+val evals = children.map(_.genCode(ctx))
+
+val arrVals = ctx.freshName("arrVals")
+val arrCardinality = ctx.freshName("arrCardinality")
+val biggestCardinality = ctx.freshName("biggestCardinality")
+val storedArrTypes = ctx.freshName("storedArrTypes")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  s"""
+|${eval.code}
+|if (!${eval.isNull}) {
+|  $arrVals[$index] = ${eval.value};
+|  $arrCardinality[$index] = ${eval.value}.numElements();
+|} else {
+|  $arrVals[$index] = null;
+|  $arrCardinality[$index] = 0;
+|}
+|$storedArrTypes[$index] = "${arrayElementTypes(index)}";
+|$biggestCardinality = Math.max($biggestCardinality, 
$arrCardinality[$index]);
+  """.stripMargin
+}
+
+val inputsSplitted = ctx.splitExpressions(
+  expressions = inputs,
+  funcName = "getInputAndCardinality",
+  returnType = "int",
+  makeSplitFunction = body =>
+s"""
+  |$body
+  |return $biggestCardinality;
+""".stripMargin,
+  foldFunctions = _.map(funcCall => s"$biggestCardinality = 
$funcCall;").mkString("\n"),
+  arguments =
+("ArrayData[]", arrVals) ::
+("int[]", arrCardinality) ::
+("String[]", storedArrTypes) ::
+("int", biggestCardinality) :: Nil)
+
+val myobject = ctx.freshName("myobject")
+val j = ctx.freshName("j")
+val i = ctx.freshName("i")
+val args = ctx.freshName("args")
+
+val fillValue = arrayElementTypes.distinct.map { case (elementType) =>
+  val getArrValsItem = CodeGenerator.getValue(s"$arrVals[$j]", 
elementType, i)
+  s"""
+  |if ($storedArrTypes[$j] == "${elementType}") {
+  |  $myobject[$j] = $getArrValsItem;
+  |}
--- End diff --

I don't get it, how?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21398
  
**[Test build #90994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90994/testReport)**
 for PR 21398 at commit 
[`5f64a95`](https://github.com/apache/spark/commit/5f64a95e9557feb3366cf2ac986de6da8ce165a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21395
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21395
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90986/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21395
  
**[Test build #90986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90986/testReport)**
 for PR 21395 at commit 
[`045307b`](https://github.com/apache/spark/commit/045307bc4088ca471e16eb1a48c3cc1b75e8bf61).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21395


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-05-22 Thread DylanGuedes

Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/21045#discussion_r190034892
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -127,6 +127,148 @@ case class MapKeys(child: Expression)
   override def prettyName: String = "map_keys"
 }
 
+@ExpressionDescription(
+  usage = """_FUNC_(a1, a2, ...) - Returns a merged array containing in 
the N-th position the
+  N-th value of each array given.""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4));
+[[1, 2], [2, 3], [3, 4]]
+  > SELECT _FUNC_(array(1, 2), array(2, 3), array(3, 4));
+[[1, 2, 3], [2, 3, 4]]
+  """,
+  since = "2.4.0")
+case class Zip(children: Seq[Expression]) extends Expression with 
ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.length)(ArrayType)
+
+  override def dataType: DataType = ArrayType(mountSchema)
+
+  override def prettyName: String = "zip"
+
+  override def nullable: Boolean = children.forall(_.nullable)
+
+  lazy val numberOfArrays: Int = children.length
+
+  private lazy val arrayTypes = 
children.map(_.dataType.asInstanceOf[ArrayType])
+
+  private lazy val arrayElementTypes = arrayTypes.map(_.elementType)
+
+  def mountSchema: StructType = {
+val fields = arrayTypes.zipWithIndex.foldRight(List[StructField]()) { 
case ((arr, idx), list) =>
+  StructField(s"_$idx", arr.elementType, children(idx).nullable || 
arr.containsNull) :: list
+}
+StructType(fields)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val genericArrayData = classOf[GenericArrayData].getName
+val genericInternalRow = classOf[GenericInternalRow].getName
+
+val evals = children.map(_.genCode(ctx))
+
+val arrVals = ctx.freshName("arrVals")
+val arrCardinality = ctx.freshName("arrCardinality")
+val biggestCardinality = ctx.freshName("biggestCardinality")
+val storedArrTypes = ctx.freshName("storedArrTypes")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  s"""
+|${eval.code}
+|if (!${eval.isNull}) {
+|  $arrVals[$index] = ${eval.value};
+|  $arrCardinality[$index] = ${eval.value}.numElements();
+|} else {
+|  $arrVals[$index] = null;
+|  $arrCardinality[$index] = 0;
+|}
+|$storedArrTypes[$index] = "${arrayElementTypes(index)}";
+|$biggestCardinality = Math.max($biggestCardinality, 
$arrCardinality[$index]);
+  """.stripMargin
+}
+
+val inputsSplitted = ctx.splitExpressions(
+  expressions = inputs,
+  funcName = "getInputAndCardinality",
+  returnType = "int",
+  makeSplitFunction = body =>
+s"""
+  |$body
+  |return $biggestCardinality;
+""".stripMargin,
+  foldFunctions = _.map(funcCall => s"$biggestCardinality = 
$funcCall;").mkString("\n"),
+  arguments =
+("ArrayData[]", arrVals) ::
+("int[]", arrCardinality) ::
+("String[]", storedArrTypes) ::
+("int", biggestCardinality) :: Nil)
+
+val myobject = ctx.freshName("myobject")
+val j = ctx.freshName("j")
+val i = ctx.freshName("i")
+val args = ctx.freshName("args")
+
+val fillValue = arrayElementTypes.distinct.map { case (elementType) =>
+  val getArrValsItem = CodeGenerator.getValue(s"$arrVals[$j]", 
elementType, i)
+  s"""
+  |if ($storedArrTypes[$j] == "${elementType}") {
+  |  $myobject[$j] = $getArrValsItem;
+  |}
+  """.stripMargin
+}.mkString("\n")
+
+ev.copy(s"""
+  |ArrayData[] $arrVals = new ArrayData[$numberOfArrays];
+  |int[] $arrCardinality = new int[$numberOfArrays];
+  |int $biggestCardinality = 0;
+  |String[] $storedArrTypes = new String[$numberOfArrays];
+  |$inputsSplitted
+  |Object[] $args = new Object[$biggestCardinality];
+  |for (int $i = 0; $i < $biggestCardinality; $i ++) {
+  |  Object[] $myobject = new Object[$numberOfArrays];
+  |  for (int $j = 0; $j < $numberOfArrays; $j ++) {
+  |if ($arrVals[$j] != null && $arrCardinality[$j] > $i && 
!$arrVals[$j].isNullAt($i)) {
+  |  $fillValue
+  |} else {
+  |  $myobject[$j] = null;
+  |}
+  |  }
+  |  $args[$i] = new $genericInternalRow($myobject);
+  |}
+  |boolean ${ev.isNull} = false;
--- End diff --

@ueshin After some time

[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90979/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21069
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90985/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21069
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20929
  
**[Test build #90979 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90979/testReport)**
 for PR 20929 at commit 
[`6c4592d`](https://github.com/apache/spark/commit/6c4592d2b8f008adfed79a31a8373641d4f4f550).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21395#discussion_r190033826
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1469,6 +1469,7 @@ case class ElementAt(left: Expression, right: 
Expression) extends GetMapValueUti
   left.dataType match {
 case _: ArrayType => IntegerType
 case _: MapType => left.dataType.asInstanceOf[MapType].keyType
+case _ => AnyDataType // no match for a wrong 'left' expression 
type
--- End diff --

BTW, normally, we prefer to 
```
  override def inputTypes: Seq[AbstractDataType] = {
val rightInputType = left.dataType match {
  case _: ArrayType => IntegerType
  case _: MapType => left.dataType.asInstanceOf[MapType].keyType
  case _ => AnyDataType // no match for a wrong 'left' expression type
}
Seq(TypeCollection(ArrayType, MapType), rightInputType)
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21398
  
**[Test build #90992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90992/testReport)**
 for PR 21398 at commit 
[`0f4be31`](https://github.com/apache/spark/commit/0f4be3110df5f7a92313260872994bc18685332e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21398
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21069
  
**[Test build #90985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90985/testReport)**
 for PR 21069 at commit 
[`9281ae2`](https://github.com/apache/spark/commit/9281ae233dc54dd961e99e345be559929232c148).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21398
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90992/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #90993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90993/testReport)**
 for PR 20894 at commit 
[`7dce1e7`](https://github.com/apache/spark/commit/7dce1e72f080044ba471fa08bb572452d4b0c907).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21398
  
**[Test build #90992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90992/testReport)**
 for PR 21398 at commit 
[`0f4be31`](https://github.com/apache/spark/commit/0f4be3110df5f7a92313260872994bc18685332e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90977/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21399
  
**[Test build #90977 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90977/testReport)**
 for PR 21399 at commit 
[`558b1be`](https://github.com/apache/spark/commit/558b1beaa73cc536588c9a99b291ea7dcd72a1ff).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21398
  
(Although I'm pretty sure it will fail style checks.)

Also, @cloud-fan since you wrote the original change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21398
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21372
  
A few basic questions about this upgrade. 

What are the benefits of these nine trivial patches? If no impact on Spark 
users, we should not upgrade it; if the new release fixes the bug, we need to 
add the test cases to verify the fix. Please prove the necessity of the 
upgrade. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21372#discussion_r190030340
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
 ---
@@ -136,7 +136,7 @@ public int getInt(int rowId) {
   public long getLong(int rowId) {
 int index = getRowIndex(rowId);
 if (isTimestamp) {
-  return timestampData.time[index] * 1000 + timestampData.nanos[index] 
/ 1000;
+  return timestampData.time[index] * 1000 + timestampData.nanos[index] 
/ 1000 % 1000;
--- End diff --

Are you saying no external impact of ORC-306?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90980/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21399
  
**[Test build #90980 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90980/testReport)**
 for PR 21399 at commit 
[`54d070d`](https://github.com/apache/spark/commit/54d070d2517c2dd1db33347131f049d0d08f73dc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...

2018-05-22 Thread szyszy

Github user szyszy commented on the issue:

https://github.com/apache/spark/pull/20761
  
@vanzin , @galv : I think all of your comments are either addressed or 
answered, please check once again!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3475/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21395
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90974/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90975/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-05-22 Thread szyszy

Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r190024350
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceTypeHelper.scala
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.reflect.InvocationTargetException
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the yarn API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+object ResourceTypeHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val resourceTypesNotAvailableErrorMessage =
+"Ignoring updating resource with resource types because " +
+"the version of YARN does not support it!"
+
+  def setResourceInfoFromResourceTypes(
+  resourceTypesParam: Map[String, String],
+  resource: Resource): Resource = {
+if (resource == null) {
+  throw new IllegalArgumentException("Resource parameter should not be 
null!")
+}
+
+if (!ResourceTypeHelper.isYarnResourceTypesAvailable()) {
+  logWarning(resourceTypesNotAvailableErrorMessage)
+  return resource
+}
+
+val resourceTypes = resourceTypesParam.map { case (k, v) => (
+  if (k.equals("memory")) {
+logWarning("Trying to use 'memory' as a custom resource, converted 
it to 'memory-mb'")
+"memory-mb"
+  } else k, v)
+}
+
+logDebug(s"Custom resource types: $resourceTypes")
+resourceTypes.foreach { rt =>
+  val resourceName: String = rt._1
+  val (amount, unit) = getAmountAndUnit(rt._2)
+  logDebug(s"Registering resource with name: $resourceName, amount: 
$amount, unit: $unit")
+
+  try {
+val resInfoClass = Utils.classForName(
+  "org.apache.hadoop.yarn.api.records.ResourceInformation")
+val setResourceInformationMethod =
+  resource.getClass.getMethod("setResourceInformation", 
classOf[String],
+resInfoClass)
+
+val resourceInformation =
+  createResourceInformation(resourceName, amount, unit, 
resInfoClass)
+setResourceInformationMethod.invoke(resource, resourceName, 
resourceInformation)
+  } catch {
+case e: InvocationTargetException =>
+  if (e.getCause != null) {
+throw e.getCause
+  } else {
+throw e
+  }
+case NonFatal(e) =>
+  logWarning(resourceTypesNotAvailableErrorMessage, e)
+  }
+}
+resource
+  }
+
+  def getCustomResourcesAsStrings(resource: Resource): String = {
+if (resource == null) {
+  throw new IllegalArgumentException("Resource parameter should not be 
null!")
+}
+
+if (!ResourceTypeHelper.isYarnResourceTypesAvailable()) {
+  logWarning(resourceTypesNotAvailableErrorMessage)
+  return ""
+}
+
+var res: String = ""
+try {
+  val resUtilsClass = Utils.classForName(
+"org.apache.hadoop.yarn.util.resource.ResourceUtils")
+  val getNumberOfResourceTypesMethod = 
resUtilsClass.getMethod("getNumberOfKnownResourceTypes")
+  val numberOfResourceTypes: Int = 
getNumberOfResourceTypesMethod.invoke(null).asInstanceOf[Int]
+  val resourceClass = Utils.classForName(
+"org.apache.hadoop.yarn.api.records.Resource")
+
+  // skip memory and vcores (index 0 and 1)
+  for (i <- 2 until numberOfResourceTypes) {
+val getResourceInfoMethod = 
resourceClass.getMethod("getResourceInformation",
+  classOf[Int])
+

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21395
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21395
  
**[Test build #90974 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90974/testReport)**
 for PR 21395 at commit 
[`ec2e4a6`](https://github.com/apache/spark/commit/ec2e4a6e60e93afbfdce918f63f43a1862f52f83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21381
  
**[Test build #90975 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90975/testReport)**
 for PR 21381 at commit 
[`54b3b5f`](https://github.com/apache/spark/commit/54b3b5f1c973638d4c32d2b297c8b7c9ff72f28a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

2018-05-22 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21380#discussion_r190023523
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -300,14 +302,11 @@ private[csv] object UnivocityParser {
   lines
 }
 
-val filteredLines: Iterator[String] =
-  CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
--- End diff --

Actually the test from #21394 shows the case when this PR has different 
behavior: empty lines consist of multiple whitespaces + 
`ignoreLeadingWhiteSpace` is `false` (which is by default) produces `null`s. 
UniVocity parser can ignore lines with multiple whitespaces only when 
`ignoreLeadingWhiteSpace` (or `ignoreLeadingWhiteSpace`) is set to `true`. 

So, there is no combination of CSV options that allow to have default 
behavior of current implementation. I would like to propose to close this PR 
and add the test from #21394 to CSVSuite to be sure we will not break the 
behavior described above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread icexelloss

Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
@BryanCutler @HyukjinKwon I am able to have it reproduced in unit test. 
Please take a look thanks! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #90991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90991/testReport)**
 for PR 21397 at commit 
[`69c9104`](https://github.com/apache/spark/commit/69c91043981056a732328b474986fe4128936c62).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20997
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20997
  
**[Test build #90989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90989/testReport)**
 for PR 20997 at commit 
[`6cd67c6`](https://github.com/apache/spark/commit/6cd67c6ac7b948eb791cc4871477ab0b1df4fcad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20997
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90989/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-05-22 Thread yuchaoran2011

Github user yuchaoran2011 commented on the issue:

https://github.com/apache/spark/pull/21398
  
Thanks @vanzin. I've verified this commit has all the changes you 
mentioned. I've also removed references to CDH in the code comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21401
  
**[Test build #90990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90990/testReport)**
 for PR 21401 at commit 
[`8e8f623`](https://github.com/apache/spark/commit/8e8f62385b3a81cd21b5fd033a5dee9bc7d8a040).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix

2018-05-22 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21401
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-05-22 Thread koeninger

Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/20997
  
I'm fine as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90972/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #90972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90972/testReport)**
 for PR 21397 at commit 
[`435ccff`](https://github.com/apache/spark/commit/435ccfff44995ca5ad487e77128b2cae4ff1cfd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

2018-05-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90978/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql

2018-05-22 Thread DylanGuedes

Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/21045#discussion_r190014678
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -127,6 +127,148 @@ case class MapKeys(child: Expression)
   override def prettyName: String = "map_keys"
 }
 
+@ExpressionDescription(
+  usage = """_FUNC_(a1, a2, ...) - Returns a merged array containing in 
the N-th position the
+  N-th value of each array given.""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4));
+[[1, 2], [2, 3], [3, 4]]
+  > SELECT _FUNC_(array(1, 2), array(2, 3), array(3, 4));
+[[1, 2, 3], [2, 3, 4]]
+  """,
+  since = "2.4.0")
+case class Zip(children: Seq[Expression]) extends Expression with 
ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.length)(ArrayType)
+
+  override def dataType: DataType = ArrayType(mountSchema)
+
+  override def prettyName: String = "zip"
+
+  override def nullable: Boolean = children.forall(_.nullable)
+
+  lazy val numberOfArrays: Int = children.length
+
+  private lazy val arrayTypes = 
children.map(_.dataType.asInstanceOf[ArrayType])
+
+  private lazy val arrayElementTypes = arrayTypes.map(_.elementType)
+
+  def mountSchema: StructType = {
+val fields = arrayTypes.zipWithIndex.foldRight(List[StructField]()) { 
case ((arr, idx), list) =>
+  StructField(s"_$idx", arr.elementType, children(idx).nullable || 
arr.containsNull) :: list
+}
+StructType(fields)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val genericArrayData = classOf[GenericArrayData].getName
+val genericInternalRow = classOf[GenericInternalRow].getName
+
+val evals = children.map(_.genCode(ctx))
+
+val arrVals = ctx.freshName("arrVals")
+val arrCardinality = ctx.freshName("arrCardinality")
+val biggestCardinality = ctx.freshName("biggestCardinality")
+val storedArrTypes = ctx.freshName("storedArrTypes")
+
+val inputs = evals.zipWithIndex.map { case (eval, index) =>
+  s"""
+|${eval.code}
+|if (!${eval.isNull}) {
+|  $arrVals[$index] = ${eval.value};
+|  $arrCardinality[$index] = ${eval.value}.numElements();
+|} else {
+|  $arrVals[$index] = null;
+|  $arrCardinality[$index] = 0;
+|}
+|$storedArrTypes[$index] = "${arrayElementTypes(index)}";
+|$biggestCardinality = Math.max($biggestCardinality, 
$arrCardinality[$index]);
+  """.stripMargin
+}
+
+val inputsSplitted = ctx.splitExpressions(
+  expressions = inputs,
+  funcName = "getInputAndCardinality",
+  returnType = "int",
+  makeSplitFunction = body =>
+s"""
+  |$body
+  |return $biggestCardinality;
+""".stripMargin,
+  foldFunctions = _.map(funcCall => s"$biggestCardinality = 
$funcCall;").mkString("\n"),
+  arguments =
+("ArrayData[]", arrVals) ::
+("int[]", arrCardinality) ::
+("String[]", storedArrTypes) ::
+("int", biggestCardinality) :: Nil)
+
+val myobject = ctx.freshName("myobject")
+val j = ctx.freshName("j")
+val i = ctx.freshName("i")
+val args = ctx.freshName("args")
+
+val fillValue = arrayElementTypes.distinct.map { case (elementType) =>
+  val getArrValsItem = CodeGenerator.getValue(s"$arrVals[$j]", 
elementType, i)
+  s"""
+  |if ($storedArrTypes[$j] == "${elementType}") {
+  |  $myobject[$j] = $getArrValsItem;
+  |}
+  """.stripMargin
+}.mkString("\n")
+
+ev.copy(s"""
+  |ArrayData[] $arrVals = new ArrayData[$numberOfArrays];
+  |int[] $arrCardinality = new int[$numberOfArrays];
+  |int $biggestCardinality = 0;
+  |String[] $storedArrTypes = new String[$numberOfArrays];
+  |$inputsSplitted
+  |Object[] $args = new Object[$biggestCardinality];
+  |for (int $i = 0; $i < $biggestCardinality; $i ++) {
+  |  Object[] $myobject = new Object[$numberOfArrays];
+  |  for (int $j = 0; $j < $numberOfArrays; $j ++) {
+  |if ($arrVals[$j] != null && $arrCardinality[$j] > $i && 
!$arrVals[$j].isNullAt($i)) {
+  |  $fillValue
+  |} else {
+  |  $myobject[$j] = null;
+  |}
+  |  }
+  |  $args[$i] = new $genericInternalRow($myobject);
+  |}
+  |boolean ${ev.isNull} = false;
--- End diff --

I didn't checked how Presto

< 1 2 3 4 5 6 7 8 9 >

201 - 300 of 832 matches

Mail list logo