[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190057347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +62,46 @@ private[shuffle] class UnsafeRowReceiver( override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case r: UnsafeRowReceiverMessage => - queue.put(r) + queues(r.writerId).put(r) context.reply(()) } override def read(): Iterator[UnsafeRow] = { new NextIterator[UnsafeRow] { - override def getNext(): UnsafeRow = queue.take() match { -case ReceiverRow(r) => r -case ReceiverEpochMarker() => - finished = true - null + private val numWriterEpochMarkers = new AtomicInteger(0) --- End diff -- does this need a atomicinteger? seems like this is only incremented and accessed from the main thread draining the iterator below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190057920 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +62,46 @@ private[shuffle] class UnsafeRowReceiver( override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case r: UnsafeRowReceiverMessage => - queue.put(r) + queues(r.writerId).put(r) context.reply(()) } override def read(): Iterator[UnsafeRow] = { new NextIterator[UnsafeRow] { - override def getNext(): UnsafeRow = queue.take() match { -case ReceiverRow(r) => r -case ReceiverEpochMarker() => - finished = true - null + private val numWriterEpochMarkers = new AtomicInteger(0) + + private val executor = Executors.newFixedThreadPool(numShuffleWriters) + private val completion = new ExecutorCompletionService[UnsafeRowReceiverMessage](executor) + + private def completionTask(writerId: Int) = new Callable[UnsafeRowReceiverMessage] { +override def call(): UnsafeRowReceiverMessage = queues(writerId).take() } - override def close(): Unit = {} + (0 until numShuffleWriters).foreach(writerId => completion.submit(completionTask(writerId))) + + override def getNext(): UnsafeRow = { +completion.take().get() match { + case ReceiverRow(writerId, r) => +// Start reading the next element in the queue we just took from. +completion.submit(completionTask(writerId)) +r +// TODO use writerId + case ReceiverEpochMarker(writerId) => +// Don't read any more from this queue. If all the writers have sent epoch markers, +// the epoch is over; otherwise we need rows from one of the remaining writers. +val writersCompleted = numWriterEpochMarkers.incrementAndGet() +if (writersCompleted == numShuffleWriters) { + finished = true + null +} else { + getNext() --- End diff -- with a large number of shuffle writers, there is a very slight chance that this is can lead to a stackoverflow; when you accidentally get a sequence of N markers from N writers and N is large. make this a while loop. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21394 **[Test build #90997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90997/testReport)** for PR 21394 at commit [`70839e8`](https://github.com/apache/spark/commit/70839e8bf56338698d750a9d7830dca29ae13dce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21394 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3477/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix
Github user wajda commented on the issue: https://github.com/apache/spark/pull/21401 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21402: SPARK-24335 Spark external shuffle server improvement to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21402 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21402: SPARK-24335 Spark external shuffle server improve...
GitHub user Victsm opened a pull request: https://github.com/apache/spark/pull/21402 SPARK-24335 Spark external shuffle server improvement to better handle block fetch requests. ## What changes were proposed in this pull request? Right now, the default server side netty handler threads is 2 * # cores, and can be further configured with parameter spark.shuffle.io.serverThreads. In order to process a client request, it would require one available server netty handler thread. However, when the server netty handler threads start to process ChunkFetchRequests, they will be blocked on disk I/O, mostly due to disk contentions from the random read operations initiated by all the ChunkFetchRequests received from clients. As a result, when the shuffle server is serving many concurrent ChunkFetchRequests, the server side netty handler threads could all be blocked on reading shuffle files, thus leaving no handler thread available to process other types of requests which should all be very quick to process. This issue could potentially be fixed by limiting the number of netty handler threads that could get blocked when processing ChunkFetchRequest. We have a patch to do this by using a separate EventLoopGroup with a dedicated ChannelHandler to process ChunkFetchRequest. This enables shuffle server to reserve netty handler threads for non-ChunkFetchRequest, thus enabling consistent processing time for these requests which are fast to process. After deploying the patch in our infrastructure, we no longer see timeout issues with either executor registration with local shuffle server or shuffle client establishing connection with remote shuffle server. ## How was this patch tested? Added unit test for this patch. In addition, we built an internal tool to stress test Spark shuffle service and have observed significant improvement after applying this patch. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Victsm/spark SPARK-24355 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21402.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21402 commit d862a2dec30f1fefe03d26e0b8a07d10eac7dbf5 Author: Min ShenDate: 2017-09-11T03:59:25Z SPARK-24335 Spark external shuffle server improvement to better handle block fetch requests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21402: SPARK-24335 Spark external shuffle server improvement to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21402 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21082 @ueshin I responded to your comments. Please let me know what you think, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r190065854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -268,3 +269,38 @@ object PhysicalAggregation { case _ => None } } + +/** + * An extractor used when planning physical execution of a window. This extractor outputs + * the window function type of the logical window. + * + * The input logical window must contain same type of window functions, which is ensured by + * the rule ExtractWindowExpressions in the analyzer. + */ +object PhysicalWindow { + // windowFunctionType, windowExpression, partitionSpec, orderSpec, child + type ReturnType = +(WindowFunctionType, Seq[NamedExpression], Seq[Expression], Seq[SortOrder], LogicalPlan) + + def unapply(a: Any): Option[ReturnType] = a match { +case expr @ logical.Window(windowExpressions, partitionSpec, orderSpec, child) => + + if (windowExpressions.isEmpty) { +throw new AnalysisException(s"Window expression is empty in $expr") --- End diff -- I added comments to explain this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21382: [SPARK-24332][SS][MESOS]Fix places reading 'spark.networ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/21382 @felixcheung `spark.network.timeout` is one of the most weird configs as it may have different default values in different places. I think this is why it's not added to the global `org.apache.spark.internal.config`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21082 **[Test build #90996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90996/testReport)** for PR 21082 at commit [`ed02823`](https://github.com/apache/spark/commit/ed028237d9892e8d1e32607e77dad24ad0252b05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21401 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90990/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21401 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21401 **[Test build #90990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90990/testReport)** for PR 21401 at commit [`8e8f623`](https://github.com/apache/spark/commit/8e8f62385b3a81cd21b5fd033a5dee9bc7d8a040). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21331: [SPARK-24276][SQL] Order of literals in IN should...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21331#discussion_r190062760 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -85,6 +87,10 @@ object Canonicalize { case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +// order the list in the In operator +case In(value, list) => + In(value, list.sortBy(_.semanticHash())) --- End diff -- A few general suggestions. - We need to add test cases when we want to cover the extra cases. - Update the comments in the description when we add new code - Try our best to make the code simple. - Find the best test suite when we add a new test. If unable to find a good one, create a new test suite. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r190061061 --- Diff: python/pyspark/sql/tests.py --- @@ -5181,6 +5190,235 @@ def test_invalid_args(self): 'mixture.*aggregate function.*group aggregate pandas UDF'): df.groupby(df.id).agg(mean_udf(df.v), mean(df.v)).collect() + +@unittest.skipIf( +not _have_pandas or not _have_pyarrow, +_pandas_requirement_message or _pyarrow_requirement_message) +class WindowPandasUDFTests(ReusedSQLTestCase): +@property +def data(self): +from pyspark.sql.functions import array, explode, col, lit +return self.spark.range(10).toDF('id') \ +.withColumn("vs", array([lit(i * 1.0) + col('id') for i in range(20, 30)])) \ +.withColumn("v", explode(col('vs'))) \ +.drop('vs') \ +.withColumn('w', lit(1.0)) + +@property +def python_plus_one(self): --- End diff -- I agree those should be shared, but let's do it maybe in a separate PR? Because the refactor will probably touch many test cases and the PR is quite large already.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r190060687 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -268,3 +269,38 @@ object PhysicalAggregation { case _ => None } } + +/** + * An extractor used when planning physical execution of a window. This extractor outputs + * the window function type of the logical window. + * + * The input logical window must contain same type of window functions, which is ensured by + * the rule ExtractWindowExpressions in the analyzer. + */ +object PhysicalWindow { + // windowFunctionType, windowExpression, partitionSpec, orderSpec, child + type ReturnType = +(WindowFunctionType, Seq[NamedExpression], Seq[Expression], Seq[SortOrder], LogicalPlan) + + def unapply(a: Any): Option[ReturnType] = a match { +case expr @ logical.Window(windowExpressions, partitionSpec, orderSpec, child) => + + if (windowExpressions.isEmpty) { +throw new AnalysisException(s"Window expression is empty in $expr") --- End diff -- Window expression should not be empty here so this should not be reached. This is just for safety. Let me add a comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r190060086 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1869,6 +1870,8 @@ class Analyzer( case window: WindowExpression => window.windowSpec }.distinct +val windowFunctionType = WindowFunctionType.functionType(expr) --- End diff -- This is used for grouping window functions into sql window functions and pandas UDFs and create different logical node for them. The value is used here https://github.com/apache/spark/pull/21082/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R1886 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 I didn't say ORC-301 resolves the issue of SPARK-23458 and SPARK-23390. SPARK-23458 and SPARK-23390 reports open file leakages in some unknown situations, doesn't it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21394 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21394 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90988/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21394: [SPARK-24329][SQL] Test for skipping multi-space lines
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21394 **[Test build #90988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90988/testReport)** for PR 21394 at commit [`70839e8`](https://github.com/apache/spark/commit/70839e8bf56338698d750a9d7830dca29ae13dce). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskKilled(` * `trait HasValidationIndicatorCol extends Params ` * `trait DivModLike extends BinaryArithmetic ` * `case class Divide(left: Expression, right: Expression) extends DivModLike ` * `case class Remainder(left: Expression, right: Expression) extends DivModLike ` * `class ReadOnlySQLConf(context: TaskContext) extends SQLConf ` * `class TaskContextConfigProvider(context: TaskContext) extends ConfigProvider ` * `case class ContinuousShuffleReadPartition(index: Int, queueSize: Int) extends Partition ` * `class ContinuousShuffleReadRDD(` * `trait ContinuousShuffleReader ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9503: [SPARK-9818] Re-enable Docker tests for JDBC data source
Github user nonsleepr commented on the issue: https://github.com/apache/spark/pull/9503 Adding following for better visibility. The problem while using [`com.whisk:docker-testkit-impl-spotify`](https://github.com/whisklabs/docker-it-scala) along with Spark: ```log com.spotify.docker.client.exceptions.DockerException: java.util.concurrent.ExecutionException: javax.ws.rs.ProcessingException: java.lang.IncompatibleClassChangeError: Found interface org.objectweb.asm.ClassVisitor, but class was expected at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2657) at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2588) at com.spotify.docker.client.DefaultDockerClient.info(DefaultDockerClient.java:522) ... 40 elided Caused by: java.util.concurrent.ExecutionException: javax.ws.rs.ProcessingException: java.lang.IncompatibleClassChangeError: Found interface org.objectweb.asm.ClassVisitor, but class was expected at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2586) ... 41 more Caused by: javax.ws.rs.ProcessingException: java.lang.IncompatibleClassChangeError: Found interface org.objectweb.asm.ClassVisitor, but class was expected at org.glassfish.jersey.client.ClientRuntime.processFailure(ClientRuntime.java:202) at org.glassfish.jersey.client.ClientRuntime.access$400(ClientRuntime.java:79) at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:182) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:340) at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:210) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Caused by: java.lang.IncompatibleClassChangeError: Found interface org.objectweb.asm.ClassVisitor, but class was expected at jnr.ffi.provider.jffi.AsmLibraryLoader.generateInterfaceImpl(AsmLibraryLoader.java:74) at jnr.ffi.provider.jffi.AsmLibraryLoader.loadLibrary(AsmLibraryLoader.java:59) at jnr.ffi.provider.jffi.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:43) at jnr.ffi.LibraryLoader.load(LibraryLoader.java:287) at jnr.unixsocket.Native.(Native.java:76) at jnr.unixsocket.UnixSocketChannel.(UnixSocketChannel.java:68) at jnr.unixsocket.UnixSocketChannel.open(UnixSocketChannel.java:49) at com.spotify.docker.client.ApacheUnixSocket.(ApacheUnixSocket.java:59) at com.spotify.docker.client.UnixConnectionSocketFactory.createSocket(UnixConnectionSocketFactory.java:67) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:118) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71) at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435) at org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:491) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299) at
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 For me, those three lines do not throws exceptions. Do you mean another lines? ``` OrcProto.PostScript ps; OrcProto.FileTail.Builder fileTailBuilder = OrcProto.FileTail.newBuilder(); long modificationTime; ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21372 I am just trying to find out why ORC-301 resolves the issues of SPARK-23458 and SPARK-23390 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21372 https://github.com/dongjoon-hyun/orc/blob/cad48d6b11a65264a5b22c73aa2be9029aa72988/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L520-L522 Regarding the file leakage, I did not see any exception issued in these three lines from our log? Does that mean ORC eat the exceptions attempt to re-open the files? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90984/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21399 **[Test build #90984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90984/testReport)** for PR 21399 at commit [`7bb0eb3`](https://github.com/apache/spark/commit/7bb0eb3be6619ea9d0c7a023da5b665fecbc799e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20997 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 For Timestamp issue, I'm trying to find some example. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21387: Correct reference to Offset class
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21387: Correct reference to Offset class
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90987/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21387: Correct reference to Offset class
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21387 **[Test build #90987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90987/testReport)** for PR 21387 at commit [`81a5d3d`](https://github.com/apache/spark/commit/81a5d3fb31e4b7f9540c7b1ca432df44d79f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20997 That being the case, merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 For file leakage issues, we have been monitoring the flakiness of SPARK-23458 and SPARK-23390 in our Jenkins environment. Until now, I couldn't reproduce it locally. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21331 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 @gatorsmile . Basically, ORC-301 will reduce the change of ORC file leakage in some cases. I made that patch and merged it long time ago, but it's released at this release. Also, ORC-306 fixes a bug on Java `Timestamp` and it's ORC workaround. Please see [here for the detail of Java Timestamp bug and the issue on previous ORC workaround](https://issues.apache.org/jira/browse/ORC-306). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3476/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21372#discussion_r190042175 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java --- @@ -136,7 +136,7 @@ public int getInt(int rowId) { public long getLong(int rowId) { int index = getRowIndex(rowId); if (isTimestamp) { - return timestampData.time[index] * 1000 + timestampData.nanos[index] / 1000; + return timestampData.time[index] * 1000 + timestampData.nanos[index] / 1000 % 1000; --- End diff -- No, what I mean is, with ORC-306 and this fix, there is no external impact outside Spark. More specifically, outside `OrcColumnVector`/`OrcColumnarBatchReader`. In other words, ORC 1.4.4 cannot be used with Apache Spark without this patch. Java `Timestamp.getTime` and Timestamp.getNano` has an overlap by definition. Previously, ORC didn't stick to the definition. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21331 **[Test build #90995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90995/testReport)** for PR 21331 at commit [`0c484f1`](https://github.com/apache/spark/commit/0c484f16a7bcb0f4b476decdf6820c4d89d4cfb0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21331: [SPARK-24276][SQL] Order of literals in IN should...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21331#discussion_r190038976 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -85,6 +87,10 @@ object Canonicalize { case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +// order the list in the In operator +case In(value, list) => + In(value, list.sortBy(_.semanticHash())) --- End diff -- The only difference is that the elements in the list are canonicalized before the hash. I can't think of any meaningful example. The only one I can think of is something like ` mybool in (5 < 2)` and ` mybool in (not 5 >= 2)`. But in the future we may have more rules here making meaningful using the `semanticHash` which is logically what we want here IMHO --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/21045#discussion_r190036385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -127,6 +127,148 @@ case class MapKeys(child: Expression) override def prettyName: String = "map_keys" } +@ExpressionDescription( + usage = """_FUNC_(a1, a2, ...) - Returns a merged array containing in the N-th position the + N-th value of each array given.""", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4)); +[[1, 2], [2, 3], [3, 4]] + > SELECT _FUNC_(array(1, 2), array(2, 3), array(3, 4)); +[[1, 2, 3], [2, 3, 4]] + """, + since = "2.4.0") +case class Zip(children: Seq[Expression]) extends Expression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.length)(ArrayType) + + override def dataType: DataType = ArrayType(mountSchema) + + override def prettyName: String = "zip" + + override def nullable: Boolean = children.forall(_.nullable) + + lazy val numberOfArrays: Int = children.length + + private lazy val arrayTypes = children.map(_.dataType.asInstanceOf[ArrayType]) + + private lazy val arrayElementTypes = arrayTypes.map(_.elementType) + + def mountSchema: StructType = { +val fields = arrayTypes.zipWithIndex.foldRight(List[StructField]()) { case ((arr, idx), list) => + StructField(s"_$idx", arr.elementType, children(idx).nullable || arr.containsNull) :: list +} +StructType(fields) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val genericArrayData = classOf[GenericArrayData].getName +val genericInternalRow = classOf[GenericInternalRow].getName + +val evals = children.map(_.genCode(ctx)) + +val arrVals = ctx.freshName("arrVals") +val arrCardinality = ctx.freshName("arrCardinality") +val biggestCardinality = ctx.freshName("biggestCardinality") +val storedArrTypes = ctx.freshName("storedArrTypes") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + s""" +|${eval.code} +|if (!${eval.isNull}) { +| $arrVals[$index] = ${eval.value}; +| $arrCardinality[$index] = ${eval.value}.numElements(); +|} else { +| $arrVals[$index] = null; +| $arrCardinality[$index] = 0; +|} +|$storedArrTypes[$index] = "${arrayElementTypes(index)}"; +|$biggestCardinality = Math.max($biggestCardinality, $arrCardinality[$index]); + """.stripMargin +} + +val inputsSplitted = ctx.splitExpressions( + expressions = inputs, + funcName = "getInputAndCardinality", + returnType = "int", + makeSplitFunction = body => +s""" + |$body + |return $biggestCardinality; +""".stripMargin, + foldFunctions = _.map(funcCall => s"$biggestCardinality = $funcCall;").mkString("\n"), + arguments = +("ArrayData[]", arrVals) :: +("int[]", arrCardinality) :: +("String[]", storedArrTypes) :: +("int", biggestCardinality) :: Nil) + +val myobject = ctx.freshName("myobject") +val j = ctx.freshName("j") +val i = ctx.freshName("i") +val args = ctx.freshName("args") + +val fillValue = arrayElementTypes.distinct.map { case (elementType) => + val getArrValsItem = CodeGenerator.getValue(s"$arrVals[$j]", elementType, i) + s""" + |if ($storedArrTypes[$j] == "${elementType}") { + | $myobject[$j] = $getArrValsItem; + |} --- End diff -- I don't get it, how? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21398 **[Test build #90994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90994/testReport)** for PR 21398 at commit [`5f64a95`](https://github.com/apache/spark/commit/5f64a95e9557feb3366cf2ac986de6da8ce165a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21395 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21395 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90986/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21395 **[Test build #90986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90986/testReport)** for PR 21395 at commit [`045307b`](https://github.com/apache/spark/commit/045307bc4088ca471e16eb1a48c3cc1b75e8bf61). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21395: [SPARK-24348][SQL] "element_at" error fix
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21395 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/21045#discussion_r190034892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -127,6 +127,148 @@ case class MapKeys(child: Expression) override def prettyName: String = "map_keys" } +@ExpressionDescription( + usage = """_FUNC_(a1, a2, ...) - Returns a merged array containing in the N-th position the + N-th value of each array given.""", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4)); +[[1, 2], [2, 3], [3, 4]] + > SELECT _FUNC_(array(1, 2), array(2, 3), array(3, 4)); +[[1, 2, 3], [2, 3, 4]] + """, + since = "2.4.0") +case class Zip(children: Seq[Expression]) extends Expression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.length)(ArrayType) + + override def dataType: DataType = ArrayType(mountSchema) + + override def prettyName: String = "zip" + + override def nullable: Boolean = children.forall(_.nullable) + + lazy val numberOfArrays: Int = children.length + + private lazy val arrayTypes = children.map(_.dataType.asInstanceOf[ArrayType]) + + private lazy val arrayElementTypes = arrayTypes.map(_.elementType) + + def mountSchema: StructType = { +val fields = arrayTypes.zipWithIndex.foldRight(List[StructField]()) { case ((arr, idx), list) => + StructField(s"_$idx", arr.elementType, children(idx).nullable || arr.containsNull) :: list +} +StructType(fields) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val genericArrayData = classOf[GenericArrayData].getName +val genericInternalRow = classOf[GenericInternalRow].getName + +val evals = children.map(_.genCode(ctx)) + +val arrVals = ctx.freshName("arrVals") +val arrCardinality = ctx.freshName("arrCardinality") +val biggestCardinality = ctx.freshName("biggestCardinality") +val storedArrTypes = ctx.freshName("storedArrTypes") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + s""" +|${eval.code} +|if (!${eval.isNull}) { +| $arrVals[$index] = ${eval.value}; +| $arrCardinality[$index] = ${eval.value}.numElements(); +|} else { +| $arrVals[$index] = null; +| $arrCardinality[$index] = 0; +|} +|$storedArrTypes[$index] = "${arrayElementTypes(index)}"; +|$biggestCardinality = Math.max($biggestCardinality, $arrCardinality[$index]); + """.stripMargin +} + +val inputsSplitted = ctx.splitExpressions( + expressions = inputs, + funcName = "getInputAndCardinality", + returnType = "int", + makeSplitFunction = body => +s""" + |$body + |return $biggestCardinality; +""".stripMargin, + foldFunctions = _.map(funcCall => s"$biggestCardinality = $funcCall;").mkString("\n"), + arguments = +("ArrayData[]", arrVals) :: +("int[]", arrCardinality) :: +("String[]", storedArrTypes) :: +("int", biggestCardinality) :: Nil) + +val myobject = ctx.freshName("myobject") +val j = ctx.freshName("j") +val i = ctx.freshName("i") +val args = ctx.freshName("args") + +val fillValue = arrayElementTypes.distinct.map { case (elementType) => + val getArrValsItem = CodeGenerator.getValue(s"$arrVals[$j]", elementType, i) + s""" + |if ($storedArrTypes[$j] == "${elementType}") { + | $myobject[$j] = $getArrValsItem; + |} + """.stripMargin +}.mkString("\n") + +ev.copy(s""" + |ArrayData[] $arrVals = new ArrayData[$numberOfArrays]; + |int[] $arrCardinality = new int[$numberOfArrays]; + |int $biggestCardinality = 0; + |String[] $storedArrTypes = new String[$numberOfArrays]; + |$inputsSplitted + |Object[] $args = new Object[$biggestCardinality]; + |for (int $i = 0; $i < $biggestCardinality; $i ++) { + | Object[] $myobject = new Object[$numberOfArrays]; + | for (int $j = 0; $j < $numberOfArrays; $j ++) { + |if ($arrVals[$j] != null && $arrCardinality[$j] > $i && !$arrVals[$j].isNullAt($i)) { + | $fillValue + |} else { + | $myobject[$j] = null; + |} + | } + | $args[$i] = new $genericInternalRow($myobject); + |} + |boolean ${ev.isNull} = false; --- End diff -- @ueshin After some time
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90979/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90985/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20929 **[Test build #90979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90979/testReport)** for PR 20929 at commit [`6c4592d`](https://github.com/apache/spark/commit/6c4592d2b8f008adfed79a31a8373641d4f4f550). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21395: [SPARK-24348][SQL] "element_at" error fix
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21395#discussion_r190033826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1469,6 +1469,7 @@ case class ElementAt(left: Expression, right: Expression) extends GetMapValueUti left.dataType match { case _: ArrayType => IntegerType case _: MapType => left.dataType.asInstanceOf[MapType].keyType +case _ => AnyDataType // no match for a wrong 'left' expression type --- End diff -- BTW, normally, we prefer to ``` override def inputTypes: Seq[AbstractDataType] = { val rightInputType = left.dataType match { case _: ArrayType => IntegerType case _: MapType => left.dataType.asInstanceOf[MapType].keyType case _ => AnyDataType // no match for a wrong 'left' expression type } Seq(TypeCollection(ArrayType, MapType), rightInputType) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21398 **[Test build #90992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90992/testReport)** for PR 21398 at commit [`0f4be31`](https://github.com/apache/spark/commit/0f4be3110df5f7a92313260872994bc18685332e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21398 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21069 **[Test build #90985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90985/testReport)** for PR 21069 at commit [`9281ae2`](https://github.com/apache/spark/commit/9281ae233dc54dd961e99e345be559929232c148). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21398 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90992/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20894 **[Test build #90993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90993/testReport)** for PR 20894 at commit [`7dce1e7`](https://github.com/apache/spark/commit/7dce1e72f080044ba471fa08bb572452d4b0c907). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21398 **[Test build #90992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90992/testReport)** for PR 21398 at commit [`0f4be31`](https://github.com/apache/spark/commit/0f4be3110df5f7a92313260872994bc18685332e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90977/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21399 **[Test build #90977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90977/testReport)** for PR 21399 at commit [`558b1be`](https://github.com/apache/spark/commit/558b1beaa73cc536588c9a99b291ea7dcd72a1ff). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21398 (Although I'm pretty sure it will fail style checks.) Also, @cloud-fan since you wrote the original change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21398 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21372 A few basic questions about this upgrade. What are the benefits of these nine trivial patches? If no impact on Spark users, we should not upgrade it; if the new release fixes the bug, we need to add the test cases to verify the fix. Please prove the necessity of the upgrade. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21372#discussion_r190030340 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java --- @@ -136,7 +136,7 @@ public int getInt(int rowId) { public long getLong(int rowId) { int index = getRowIndex(rowId); if (isTimestamp) { - return timestampData.time[index] * 1000 + timestampData.nanos[index] / 1000; + return timestampData.time[index] * 1000 + timestampData.nanos[index] / 1000 % 1000; --- End diff -- Are you saying no external impact of ORC-306? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90980/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21399 **[Test build #90980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90980/testReport)** for PR 21399 at commit [`54d070d`](https://github.com/apache/spark/commit/54d070d2517c2dd1db33347131f049d0d08f73dc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user szyszy commented on the issue: https://github.com/apache/spark/pull/20761 @vanzin , @galv : I think all of your comments are either addressed or answered, please check once again! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3475/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21395 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90974/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90975/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...
Github user szyszy commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r190024350 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceTypeHelper.scala --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.lang.reflect.InvocationTargetException + +import scala.util.control.NonFatal + +import org.apache.hadoop.yarn.api.records.Resource + +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +/** + * This helper class uses some of Hadoop 3 methods from the yarn API, + * so we need to use reflection to avoid compile error when building against Hadoop 2.x + */ +object ResourceTypeHelper extends Logging { + private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r + private val resourceTypesNotAvailableErrorMessage = +"Ignoring updating resource with resource types because " + +"the version of YARN does not support it!" + + def setResourceInfoFromResourceTypes( + resourceTypesParam: Map[String, String], + resource: Resource): Resource = { +if (resource == null) { + throw new IllegalArgumentException("Resource parameter should not be null!") +} + +if (!ResourceTypeHelper.isYarnResourceTypesAvailable()) { + logWarning(resourceTypesNotAvailableErrorMessage) + return resource +} + +val resourceTypes = resourceTypesParam.map { case (k, v) => ( + if (k.equals("memory")) { +logWarning("Trying to use 'memory' as a custom resource, converted it to 'memory-mb'") +"memory-mb" + } else k, v) +} + +logDebug(s"Custom resource types: $resourceTypes") +resourceTypes.foreach { rt => + val resourceName: String = rt._1 + val (amount, unit) = getAmountAndUnit(rt._2) + logDebug(s"Registering resource with name: $resourceName, amount: $amount, unit: $unit") + + try { +val resInfoClass = Utils.classForName( + "org.apache.hadoop.yarn.api.records.ResourceInformation") +val setResourceInformationMethod = + resource.getClass.getMethod("setResourceInformation", classOf[String], +resInfoClass) + +val resourceInformation = + createResourceInformation(resourceName, amount, unit, resInfoClass) +setResourceInformationMethod.invoke(resource, resourceName, resourceInformation) + } catch { +case e: InvocationTargetException => + if (e.getCause != null) { +throw e.getCause + } else { +throw e + } +case NonFatal(e) => + logWarning(resourceTypesNotAvailableErrorMessage, e) + } +} +resource + } + + def getCustomResourcesAsStrings(resource: Resource): String = { +if (resource == null) { + throw new IllegalArgumentException("Resource parameter should not be null!") +} + +if (!ResourceTypeHelper.isYarnResourceTypesAvailable()) { + logWarning(resourceTypesNotAvailableErrorMessage) + return "" +} + +var res: String = "" +try { + val resUtilsClass = Utils.classForName( +"org.apache.hadoop.yarn.util.resource.ResourceUtils") + val getNumberOfResourceTypesMethod = resUtilsClass.getMethod("getNumberOfKnownResourceTypes") + val numberOfResourceTypes: Int = getNumberOfResourceTypesMethod.invoke(null).asInstanceOf[Int] + val resourceClass = Utils.classForName( +"org.apache.hadoop.yarn.api.records.Resource") + + // skip memory and vcores (index 0 and 1) + for (i <- 2 until numberOfResourceTypes) { +val getResourceInfoMethod = resourceClass.getMethod("getResourceInformation", + classOf[Int]) +
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21395 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21395: [SPARK-24348][SQL] "element_at" error fix
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21395 **[Test build #90974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90974/testReport)** for PR 21395 at commit [`ec2e4a6`](https://github.com/apache/spark/commit/ec2e4a6e60e93afbfdce918f63f43a1862f52f83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21381 **[Test build #90975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90975/testReport)** for PR 21381 at commit [`54b3b5f`](https://github.com/apache/spark/commit/54b3b5f1c973638d4c32d2b297c8b7c9ff72f28a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21380#discussion_r190023523 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -300,14 +302,11 @@ private[csv] object UnivocityParser { lines } -val filteredLines: Iterator[String] = - CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options) --- End diff -- Actually the test from #21394 shows the case when this PR has different behavior: empty lines consist of multiple whitespaces + `ignoreLeadingWhiteSpace` is `false` (which is by default) produces `null`s. UniVocity parser can ignore lines with multiple whitespaces only when `ignoreLeadingWhiteSpace` (or `ignoreLeadingWhiteSpace`) is set to `true`. So, there is no combination of CSV options that allow to have default behavior of current implementation. I would like to propose to close this PR and add the test from #21394 to CSVSuite to be sure we will not break the behavior described above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 @BryanCutler @HyukjinKwon I am able to have it reproduced in unit test. Please take a look thanks! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #90991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90991/testReport)** for PR 21397 at commit [`69c9104`](https://github.com/apache/spark/commit/69c91043981056a732328b474986fe4128936c62). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20997 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20997 **[Test build #90989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90989/testReport)** for PR 20997 at commit [`6cd67c6`](https://github.com/apache/spark/commit/6cd67c6ac7b948eb791cc4871477ab0b1df4fcad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20997 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90989/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...
Github user yuchaoran2011 commented on the issue: https://github.com/apache/spark/pull/21398 Thanks @vanzin. I've verified this commit has all the changes you mentioned. I've also removed references to CDH in the code comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21401 **[Test build #90990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90990/testReport)** for PR 21401 at commit [`8e8f623`](https://github.com/apache/spark/commit/8e8f62385b3a81cd21b5fd033a5dee9bc7d8a040). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21401: [SPARK-24350][SQL] "array_position" error fix
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21401 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20997 I'm fine as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90972/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #90972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90972/testReport)** for PR 21397 at commit [`435ccff`](https://github.com/apache/spark/commit/435ccfff44995ca5ad487e77128b2cae4ff1cfd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20929 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90978/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/21045#discussion_r190014678 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -127,6 +127,148 @@ case class MapKeys(child: Expression) override def prettyName: String = "map_keys" } +@ExpressionDescription( + usage = """_FUNC_(a1, a2, ...) - Returns a merged array containing in the N-th position the + N-th value of each array given.""", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4)); +[[1, 2], [2, 3], [3, 4]] + > SELECT _FUNC_(array(1, 2), array(2, 3), array(3, 4)); +[[1, 2, 3], [2, 3, 4]] + """, + since = "2.4.0") +case class Zip(children: Seq[Expression]) extends Expression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.length)(ArrayType) + + override def dataType: DataType = ArrayType(mountSchema) + + override def prettyName: String = "zip" + + override def nullable: Boolean = children.forall(_.nullable) + + lazy val numberOfArrays: Int = children.length + + private lazy val arrayTypes = children.map(_.dataType.asInstanceOf[ArrayType]) + + private lazy val arrayElementTypes = arrayTypes.map(_.elementType) + + def mountSchema: StructType = { +val fields = arrayTypes.zipWithIndex.foldRight(List[StructField]()) { case ((arr, idx), list) => + StructField(s"_$idx", arr.elementType, children(idx).nullable || arr.containsNull) :: list +} +StructType(fields) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val genericArrayData = classOf[GenericArrayData].getName +val genericInternalRow = classOf[GenericInternalRow].getName + +val evals = children.map(_.genCode(ctx)) + +val arrVals = ctx.freshName("arrVals") +val arrCardinality = ctx.freshName("arrCardinality") +val biggestCardinality = ctx.freshName("biggestCardinality") +val storedArrTypes = ctx.freshName("storedArrTypes") + +val inputs = evals.zipWithIndex.map { case (eval, index) => + s""" +|${eval.code} +|if (!${eval.isNull}) { +| $arrVals[$index] = ${eval.value}; +| $arrCardinality[$index] = ${eval.value}.numElements(); +|} else { +| $arrVals[$index] = null; +| $arrCardinality[$index] = 0; +|} +|$storedArrTypes[$index] = "${arrayElementTypes(index)}"; +|$biggestCardinality = Math.max($biggestCardinality, $arrCardinality[$index]); + """.stripMargin +} + +val inputsSplitted = ctx.splitExpressions( + expressions = inputs, + funcName = "getInputAndCardinality", + returnType = "int", + makeSplitFunction = body => +s""" + |$body + |return $biggestCardinality; +""".stripMargin, + foldFunctions = _.map(funcCall => s"$biggestCardinality = $funcCall;").mkString("\n"), + arguments = +("ArrayData[]", arrVals) :: +("int[]", arrCardinality) :: +("String[]", storedArrTypes) :: +("int", biggestCardinality) :: Nil) + +val myobject = ctx.freshName("myobject") +val j = ctx.freshName("j") +val i = ctx.freshName("i") +val args = ctx.freshName("args") + +val fillValue = arrayElementTypes.distinct.map { case (elementType) => + val getArrValsItem = CodeGenerator.getValue(s"$arrVals[$j]", elementType, i) + s""" + |if ($storedArrTypes[$j] == "${elementType}") { + | $myobject[$j] = $getArrValsItem; + |} + """.stripMargin +}.mkString("\n") + +ev.copy(s""" + |ArrayData[] $arrVals = new ArrayData[$numberOfArrays]; + |int[] $arrCardinality = new int[$numberOfArrays]; + |int $biggestCardinality = 0; + |String[] $storedArrTypes = new String[$numberOfArrays]; + |$inputsSplitted + |Object[] $args = new Object[$biggestCardinality]; + |for (int $i = 0; $i < $biggestCardinality; $i ++) { + | Object[] $myobject = new Object[$numberOfArrays]; + | for (int $j = 0; $j < $numberOfArrays; $j ++) { + |if ($arrVals[$j] != null && $arrCardinality[$j] > $i && !$arrVals[$j].isNullAt($i)) { + | $fillValue + |} else { + | $myobject[$j] = null; + |} + | } + | $args[$i] = new $genericInternalRow($myobject); + |} + |boolean ${ev.isNull} = false; --- End diff -- I didn't checked how Presto