[jira] [Assigned] (SPARK-32236) Local cluster should shutdown gracefully

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32236:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Local cluster should shutdown gracefully
> 
>
> Key: SPARK-32236
> URL: https://issues.apache.org/jira/browse/SPARK-32236
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> Almost every time we call sc.stop with local cluster mode, like following 
> exceptions will be thrown.
> {code:java}
> 20/07/09 08:36:45 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() for one-way message.
> org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
> at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167)
> at 
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:150)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:691)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:253)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is the asynchronously sent RPC message KillExecutor from Master 
> can be processed 

[jira] [Assigned] (SPARK-32236) Local cluster should shutdown gracefully

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32236:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Local cluster should shutdown gracefully
> 
>
> Key: SPARK-32236
> URL: https://issues.apache.org/jira/browse/SPARK-32236
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Almost every time we call sc.stop with local cluster mode, like following 
> exceptions will be thrown.
> {code:java}
> 20/07/09 08:36:45 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() for one-way message.
> org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
> at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167)
> at 
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:150)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:691)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:253)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is the asynchronously sent RPC message KillExecutor from Master 
> can be processed 

[jira] [Commented] (SPARK-32236) Local cluster should shutdown gracefully

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154148#comment-17154148
 ] 

Apache Spark commented on SPARK-32236:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/29049

> Local cluster should shutdown gracefully
> 
>
> Key: SPARK-32236
> URL: https://issues.apache.org/jira/browse/SPARK-32236
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Almost every time we call sc.stop with local cluster mode, like following 
> exceptions will be thrown.
> {code:java}
> 20/07/09 08:36:45 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() for one-way message.
> org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
> at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167)
> at 
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:150)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:691)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:253)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The 

[jira] [Created] (SPARK-32236) Local cluster should shutdown gracefully

2020-07-08 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-32236:
--

 Summary: Local cluster should shutdown gracefully
 Key: SPARK-32236
 URL: https://issues.apache.org/jira/browse/SPARK-32236
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Almost every time we call sc.stop with local cluster mode, like following 
exceptions will be thrown.
{code:java}
20/07/09 08:36:45 ERROR TransportRequestHandler: Error while invoking 
RpcHandler#receive() for one-way message.
org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167)
at 
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:150)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:691)
at 
org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:253)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{code}
The reason is the asynchronously sent RPC message KillExecutor from Master can 
be processed after the message loop stops in Worker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32193) update docs on regexp function

2020-07-08 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32193:
-
Description: 
Sparksql support the following usage, we may update the docs to let it known to 
more users
{code:java}
 select 'abc'  REGEXP '([a-z]+)';{code}
 

 

  was:Hive support regexp function, Spark sql use `rlike` instead of `regexp` , 
we can update the docs to make it known to more users.


> update  docs on regexp function
> ---
>
> Key: SPARK-32193
> URL: https://issues.apache.org/jira/browse/SPARK-32193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Sparksql support the following usage, we may update the docs to let it known 
> to more users
> {code:java}
>  select 'abc'  REGEXP '([a-z]+)';{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32193) update docs on regexp function

2020-07-08 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32193:
-
Summary: update  docs on regexp function  (was: update migrate guide  docs 
on regexp function)

> update  docs on regexp function
> ---
>
> Key: SPARK-32193
> URL: https://issues.apache.org/jira/browse/SPARK-32193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Hive support regexp function, Spark sql use `rlike` instead of `regexp` , we 
> can update the docs to make it known to more users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31723) Flaky test: org.apache.spark.deploy.history.HistoryServerSuite

2020-07-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31723:
-

Assignee: Zhongwei Zhu

> Flaky test: org.apache.spark.deploy.history.HistoryServerSuite
> --
>
> Key: SPARK-31723
> URL: https://issues.apache.org/jira/browse/SPARK-31723
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Zhongwei Zhu
>Priority: Major
> Fix For: 3.1.0
>
>
> HistoryServerSuite.static relative links are prefixed with uiRoot 
> (spark.ui.proxyBase)
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/5010/testReport/
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 2 was 
> not greater than 4
>   at 
> org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343)
>   at 
> org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6723)
>   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6759)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.$anonfun$new$18(HistoryServerSuite.scala:388)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.org$scalatest$BeforeAndAfter$$super$runTest(HistoryServerSuite.scala:66)
>   at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203)
>   at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.runTest(HistoryServerSuite.scala:66)
>   at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
>   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
>   at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite.run(Suite.scala:1124)
>   at org.scalatest.Suite.run$(Suite.scala:1106)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
>   at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
>   at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.org$scalatest$BeforeAndAfter$$super$run(HistoryServerSuite.scala:66)
>   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
>   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.run(HistoryServerSuite.scala:66)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at 

[jira] [Resolved] (SPARK-31723) Flaky test: org.apache.spark.deploy.history.HistoryServerSuite

2020-07-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31723.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28970
[https://github.com/apache/spark/pull/28970]

> Flaky test: org.apache.spark.deploy.history.HistoryServerSuite
> --
>
> Key: SPARK-31723
> URL: https://issues.apache.org/jira/browse/SPARK-31723
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>
> HistoryServerSuite.static relative links are prefixed with uiRoot 
> (spark.ui.proxyBase)
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/5010/testReport/
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 2 was 
> not greater than 4
>   at 
> org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343)
>   at 
> org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6723)
>   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6759)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.$anonfun$new$18(HistoryServerSuite.scala:388)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.org$scalatest$BeforeAndAfter$$super$runTest(HistoryServerSuite.scala:66)
>   at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203)
>   at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.runTest(HistoryServerSuite.scala:66)
>   at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
>   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
>   at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite.run(Suite.scala:1124)
>   at org.scalatest.Suite.run$(Suite.scala:1106)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
>   at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
>   at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.org$scalatest$BeforeAndAfter$$super$run(HistoryServerSuite.scala:66)
>   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
>   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
>   at 
> org.apache.spark.deploy.history.HistoryServerSuite.run(HistoryServerSuite.scala:66)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> 

[jira] [Commented] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154092#comment-17154092
 ] 

Apache Spark commented on SPARK-32024:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/29048

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> 

[jira] [Commented] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154091#comment-17154091
 ] 

Apache Spark commented on SPARK-32024:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/29048

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> 

[jira] [Commented] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154085#comment-17154085
 ] 

Apache Spark commented on SPARK-32024:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/29047

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> 

[jira] [Commented] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154086#comment-17154086
 ] 

Apache Spark commented on SPARK-32024:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/29047

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> 

[jira] [Resolved] (SPARK-32168) DSv2 SQL overwrite incorrectly uses static plan with hidden partitions

2020-07-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32168.
---
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 28993
[https://github.com/apache/spark/pull/28993]

> DSv2 SQL overwrite incorrectly uses static plan with hidden partitions
> --
>
> Key: SPARK-32168
> URL: https://issues.apache.org/jira/browse/SPARK-32168
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.0.1, 3.1.0
>
>
> The v2 analyzer rule {{ResolveInsertInto}} tries to detect when a static 
> overwrite and a dynamic overwrite would produce the same result and will 
> choose to use static overwrite in that case. It will only use a dynamic 
> overwrite if there is a partition column without a static value and the SQL 
> mode is set to dynamic.
> {code:lang=scala}
> val dynamicPartitionOverwrite = partCols.size > staticPartitions.size &&
>   conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC
> {code}
> The problem is that {{partCols}} are the names of only partitions that are in 
> the column list (identity partitions) and does not include hidden partitions, 
> like {{days(ts)}}. As a result, this doesn't detect hidden partitions and use 
> dynamic overwrite. Static overwrite is used instead; when a table has only 
> hidden partitions, the static filter drops all table data.
> This is a correctness bug because Spark will overwrite more data than just 
> the set of partitions being written to in dynamic mode. The impact is limited 
> because this rule is only used for SQL queries (not plans from 
> DataFrameWriters) and only affects tables with hidden partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32168) DSv2 SQL overwrite incorrectly uses static plan with hidden partitions

2020-07-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32168:
-

Assignee: Ryan Blue

> DSv2 SQL overwrite incorrectly uses static plan with hidden partitions
> --
>
> Key: SPARK-32168
> URL: https://issues.apache.org/jira/browse/SPARK-32168
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Blocker
>  Labels: correctness
>
> The v2 analyzer rule {{ResolveInsertInto}} tries to detect when a static 
> overwrite and a dynamic overwrite would produce the same result and will 
> choose to use static overwrite in that case. It will only use a dynamic 
> overwrite if there is a partition column without a static value and the SQL 
> mode is set to dynamic.
> {code:lang=scala}
> val dynamicPartitionOverwrite = partCols.size > staticPartitions.size &&
>   conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC
> {code}
> The problem is that {{partCols}} are the names of only partitions that are in 
> the column list (identity partitions) and does not include hidden partitions, 
> like {{days(ts)}}. As a result, this doesn't detect hidden partitions and use 
> dynamic overwrite. Static overwrite is used instead; when a table has only 
> hidden partitions, the static filter drops all table data.
> This is a correctness bug because Spark will overwrite more data than just 
> the set of partitions being written to in dynamic mode. The impact is limited 
> because this rule is only used for SQL queries (not plans from 
> DataFrameWriters) and only affects tables with hidden partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154051#comment-17154051
 ] 

Apache Spark commented on SPARK-32024:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/29046

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> 

[jira] [Updated] (SPARK-32235) Kubernetes Configuration to set Service Account to Executors

2020-07-08 Thread Pedro Rossi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pedro Rossi updated SPARK-32235:

Description: 
Some cloud providers use Service Accounts to provide resource authorization 
(one example is described here 
[https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/)]
 and for this we need to be able to set Service Accounts to the executors.

My idea for development of this feature would be to have a configuration like 
"spark.kubernetes.authenticate.executor.serviceAccountName" in order to set the 
executors Service Account, this way it could be possible to allow only certain 
accesses to the driver and others to the executors or the same access (user's 
choice).

I am creating this issue so the maintainers can write opinions first, but I 
intend to create a pull request to address this issue also.

  was:
Some cloud providers use Service Accounts to provide resource authorization 
(one example is described here 
[https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/)]
 and for this we need to be able to set Service Accounts to the executors.

My idea for development of this feature would be to have a configuration like 
"spark.kubernetes.authenticate.executor.serviceAccountName" in order to set the 
executors Service Account, this way it could be possible to allow only certain 
accesses to the driver and others to the executors or the same access (user's 
choice)

I am creating this issue so the maintainers can write opinions first, but I 
intend to create a pull request to address this issue also

{{}}


> Kubernetes Configuration to set Service Account to Executors
> 
>
> Key: SPARK-32235
> URL: https://issues.apache.org/jira/browse/SPARK-32235
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Pedro Rossi
>Priority: Minor
>
> Some cloud providers use Service Accounts to provide resource authorization 
> (one example is described here 
> [https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/)]
>  and for this we need to be able to set Service Accounts to the executors.
> My idea for development of this feature would be to have a configuration like 
> "spark.kubernetes.authenticate.executor.serviceAccountName" in order to set 
> the executors Service Account, this way it could be possible to allow only 
> certain accesses to the driver and others to the executors or the same access 
> (user's choice).
> I am creating this issue so the maintainers can write opinions first, but I 
> intend to create a pull request to address this issue also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154050#comment-17154050
 ] 

Apache Spark commented on SPARK-32024:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/29046

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> 

[jira] [Created] (SPARK-32235) Kubernetes Configuration to set Service Account to Executors

2020-07-08 Thread Pedro Rossi (Jira)
Pedro Rossi created SPARK-32235:
---

 Summary: Kubernetes Configuration to set Service Account to 
Executors
 Key: SPARK-32235
 URL: https://issues.apache.org/jira/browse/SPARK-32235
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Pedro Rossi


Some cloud providers use Service Accounts to provide resource authorization 
(one example is described here 
[https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/)]
 and for this we need to be able to set Service Accounts to the executors.

My idea for development of this feature would be to have a configuration like 
"spark.kubernetes.authenticate.executor.serviceAccountName" in order to set the 
executors Service Account, this way it could be possible to allow only certain 
accesses to the driver and others to the executors or the same access (user's 
choice)

I am creating this issue so the maintainers can write opinions first, but I 
intend to create a pull request to address this issue also

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32233) Disable SBT unidoc generation testing in Jenkins

2020-07-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32233.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29017
[https://github.com/apache/spark/pull/29017]

> Disable SBT unidoc generation testing in Jenkins
> 
>
> Key: SPARK-32233
> URL: https://issues.apache.org/jira/browse/SPARK-32233
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32233) Disable SBT unidoc generation testing in Jenkins

2020-07-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32233:
-

Assignee: Dongjoon Hyun

> Disable SBT unidoc generation testing in Jenkins
> 
>
> Key: SPARK-32233
> URL: https://issues.apache.org/jira/browse/SPARK-32233
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32167) nullability of GetArrayStructFields is incorrect

2020-07-08 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153989#comment-17153989
 ] 

Dongjoon Hyun commented on SPARK-32167:
---

This lands at branch-2.4 via https://github.com/apache/spark/pull/29019 .

> nullability of GetArrayStructFields is incorrect
> 
>
> Key: SPARK-32167
> URL: https://issues.apache.org/jira/browse/SPARK-32167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.0.1, 3.1.0
>
>
> The following should be `Array([WrappedArray(1, null)])` instead of 
> `Array([WrappedArray(1, 0)])`
> {code:java}
> import scala.collection.JavaConverters._
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types.{ArrayType, StructType}
> val innerStruct = new StructType().add("i", "int", nullable = true)
> val schema = new StructType().add("arr", ArrayType(innerStruct, containsNull 
> = false))
> val df = spark.createDataFrame(List(Row(Seq(Row(1), Row(null.asJava, 
> schema)
> df.select($"arr".getField("i")).collect
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32167) nullability of GetArrayStructFields is incorrect

2020-07-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32167:
--
Fix Version/s: 2.4.7

> nullability of GetArrayStructFields is incorrect
> 
>
> Key: SPARK-32167
> URL: https://issues.apache.org/jira/browse/SPARK-32167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> The following should be `Array([WrappedArray(1, null)])` instead of 
> `Array([WrappedArray(1, 0)])`
> {code:java}
> import scala.collection.JavaConverters._
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types.{ArrayType, StructType}
> val innerStruct = new StructType().add("i", "int", nullable = true)
> val schema = new StructType().add("arr", ArrayType(innerStruct, containsNull 
> = false))
> val df = spark.createDataFrame(List(Row(Seq(Row(1), Row(null.asJava, 
> schema)
> df.select($"arr".getField("i")).collect
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32234) Spark sql commands are failing on select Queries for the orc tables

2020-07-08 Thread Saurabh Chawla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Chawla updated SPARK-32234:
---
Description: 
Spark sql commands are failing on select Queries for the orc tables

Steps to reproduce

 
{code:java}
val table = """CREATE TABLE `date_dim` (
  `d_date_sk` INT,
  `d_date_id` STRING,
  `d_date` TIMESTAMP,
  `d_month_seq` INT,
  `d_week_seq` INT,
  `d_quarter_seq` INT,
  `d_year` INT,
  `d_dow` INT,
  `d_moy` INT,
  `d_dom` INT,
  `d_qoy` INT,
  `d_fy_year` INT,
  `d_fy_quarter_seq` INT,
  `d_fy_week_seq` INT,
  `d_day_name` STRING,
  `d_quarter_name` STRING,
  `d_holiday` STRING,
  `d_weekend` STRING,
  `d_following_holiday` STRING,
  `d_first_dom` INT,
  `d_last_dom` INT,
  `d_same_day_ly` INT,
  `d_same_day_lq` INT,
  `d_current_day` STRING,
  `d_current_week` STRING,
  `d_current_month` STRING,
  `d_current_quarter` STRING,
  `d_current_year` STRING)
USING orc
LOCATION '/Users/test/tpcds_scale5data/date_dim'
TBLPROPERTIES (
  'transient_lastDdlTime' = '1574682806')"""

spark.sql(table).collect

val u = """select date_dim.d_date_id from date_dim limit 5"""

spark.sql(u).collect
{code}
 

 

Exception

 
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 
2, 192.168.0.103, executor driver): java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:133)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

{code}
 

 

The reason behind this initBatch is not getting the schema that is needed to 
find out the column value in OrcFileFormat.scala

 
{code:java}
batchReader.initBatch(
 TypeDescription.fromString(resultSchemaString){code}
 

Query is working if 
{code:java}
val u = """select * from date_dim limit 5"""{code}
 

  was:
Spark sql commands are failing on selecting the orc tables

Steps to reproduce

 
{code:java}
val table = """CREATE TABLE `date_dim` (
  `d_date_sk` INT,
  `d_date_id` STRING,
  `d_date` TIMESTAMP,
  `d_month_seq` INT,
  `d_week_seq` INT,
  `d_quarter_seq` INT,
  `d_year` INT,
  `d_dow` INT,
  `d_moy` INT,
  `d_dom` INT,
  `d_qoy` INT,
  `d_fy_year` INT,
  `d_fy_quarter_seq` INT,
  `d_fy_week_seq` INT,
  `d_day_name` STRING,
  `d_quarter_name` STRING,
  `d_holiday` STRING,
  `d_weekend` STRING,
  `d_following_holiday` STRING,
  `d_first_dom` INT,
  `d_last_dom` INT,
  `d_same_day_ly` INT,
  `d_same_day_lq` INT,
  `d_current_day` STRING,
  `d_current_week` STRING,
  `d_current_month` STRING,
  `d_current_quarter` STRING,
  `d_current_year` STRING)
USING orc
LOCATION '/Users/test/tpcds_scale5data/date_dim'
TBLPROPERTIES (
  'transient_lastDdlTime' = '1574682806')"""


[jira] [Updated] (SPARK-32234) Spark sql commands are failing on select Queries for the orc tables

2020-07-08 Thread Saurabh Chawla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Chawla updated SPARK-32234:
---
Summary: Spark sql commands are failing on select Queries for the  orc 
tables  (was: Spark sql commands are failing on selecting the  orc tables)

> Spark sql commands are failing on select Queries for the  orc tables
> 
>
> Key: SPARK-32234
> URL: https://issues.apache.org/jira/browse/SPARK-32234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Spark sql commands are failing on selecting the orc tables
> Steps to reproduce
>  
> {code:java}
> val table = """CREATE TABLE `date_dim` (
>   `d_date_sk` INT,
>   `d_date_id` STRING,
>   `d_date` TIMESTAMP,
>   `d_month_seq` INT,
>   `d_week_seq` INT,
>   `d_quarter_seq` INT,
>   `d_year` INT,
>   `d_dow` INT,
>   `d_moy` INT,
>   `d_dom` INT,
>   `d_qoy` INT,
>   `d_fy_year` INT,
>   `d_fy_quarter_seq` INT,
>   `d_fy_week_seq` INT,
>   `d_day_name` STRING,
>   `d_quarter_name` STRING,
>   `d_holiday` STRING,
>   `d_weekend` STRING,
>   `d_following_holiday` STRING,
>   `d_first_dom` INT,
>   `d_last_dom` INT,
>   `d_same_day_ly` INT,
>   `d_same_day_lq` INT,
>   `d_current_day` STRING,
>   `d_current_week` STRING,
>   `d_current_month` STRING,
>   `d_current_quarter` STRING,
>   `d_current_year` STRING)
> USING orc
> LOCATION '/Users/test/tpcds_scale5data/date_dim'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1574682806')"""
> spark.sql(table).collect
> val u = """select date_dim.d_date_id from date_dim limit 5"""
> spark.sql(u).collect
> {code}
>  
>  
> Exception
>  
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 2, 192.168.0.103, executor driver): 
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
> at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:133)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
> The reason behind this initBatch is not getting the schema that is needed to 
> find out the column value in OrcFileFormat.scala
>  
> {code:java}
> batchReader.initBatch(
>  TypeDescription.fromString(resultSchemaString){code}
>  
> Query is working if 
> {code:java}
> val u = """select * from date_dim limit 5"""{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Commented] (SPARK-32234) Spark sql commands are failing on selecting the orc tables

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153958#comment-17153958
 ] 

Apache Spark commented on SPARK-32234:
--

User 'SaurabhChawla100' has created a pull request for this issue:
https://github.com/apache/spark/pull/29045

> Spark sql commands are failing on selecting the  orc tables
> ---
>
> Key: SPARK-32234
> URL: https://issues.apache.org/jira/browse/SPARK-32234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Spark sql commands are failing on selecting the orc tables
> Steps to reproduce
>  
> {code:java}
> val table = """CREATE TABLE `date_dim` (
>   `d_date_sk` INT,
>   `d_date_id` STRING,
>   `d_date` TIMESTAMP,
>   `d_month_seq` INT,
>   `d_week_seq` INT,
>   `d_quarter_seq` INT,
>   `d_year` INT,
>   `d_dow` INT,
>   `d_moy` INT,
>   `d_dom` INT,
>   `d_qoy` INT,
>   `d_fy_year` INT,
>   `d_fy_quarter_seq` INT,
>   `d_fy_week_seq` INT,
>   `d_day_name` STRING,
>   `d_quarter_name` STRING,
>   `d_holiday` STRING,
>   `d_weekend` STRING,
>   `d_following_holiday` STRING,
>   `d_first_dom` INT,
>   `d_last_dom` INT,
>   `d_same_day_ly` INT,
>   `d_same_day_lq` INT,
>   `d_current_day` STRING,
>   `d_current_week` STRING,
>   `d_current_month` STRING,
>   `d_current_quarter` STRING,
>   `d_current_year` STRING)
> USING orc
> LOCATION '/Users/test/tpcds_scale5data/date_dim'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1574682806')"""
> spark.sql(table).collect
> val u = """select date_dim.d_date_id from date_dim limit 5"""
> spark.sql(u).collect
> {code}
>  
>  
> Exception
>  
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 2, 192.168.0.103, executor driver): 
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
> at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:133)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
> The reason behind this initBatch is not getting the schema that is needed to 
> find out the column value in OrcFileFormat.scala
>  
> {code:java}
> batchReader.initBatch(
>  TypeDescription.fromString(resultSchemaString){code}
>  
> Query is working if 
> {code:java}
> val u = """select * from date_dim limit 5"""{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SPARK-32234) Spark sql commands are failing on selecting the orc tables

2020-07-08 Thread Saurabh Chawla (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153953#comment-17153953
 ] 

Saurabh Chawla commented on SPARK-32234:


creating the PR for that change [https://github.com/apache/spark/pull/29045]

> Spark sql commands are failing on selecting the  orc tables
> ---
>
> Key: SPARK-32234
> URL: https://issues.apache.org/jira/browse/SPARK-32234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Spark sql commands are failing on selecting the orc tables
> Steps to reproduce
>  
> {code:java}
> val table = """CREATE TABLE `date_dim` (
>   `d_date_sk` INT,
>   `d_date_id` STRING,
>   `d_date` TIMESTAMP,
>   `d_month_seq` INT,
>   `d_week_seq` INT,
>   `d_quarter_seq` INT,
>   `d_year` INT,
>   `d_dow` INT,
>   `d_moy` INT,
>   `d_dom` INT,
>   `d_qoy` INT,
>   `d_fy_year` INT,
>   `d_fy_quarter_seq` INT,
>   `d_fy_week_seq` INT,
>   `d_day_name` STRING,
>   `d_quarter_name` STRING,
>   `d_holiday` STRING,
>   `d_weekend` STRING,
>   `d_following_holiday` STRING,
>   `d_first_dom` INT,
>   `d_last_dom` INT,
>   `d_same_day_ly` INT,
>   `d_same_day_lq` INT,
>   `d_current_day` STRING,
>   `d_current_week` STRING,
>   `d_current_month` STRING,
>   `d_current_quarter` STRING,
>   `d_current_year` STRING)
> USING orc
> LOCATION '/Users/test/tpcds_scale5data/date_dim'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1574682806')"""
> spark.sql(table).collect
> val u = """select date_dim.d_date_id from date_dim limit 5"""
> spark.sql(u).collect
> {code}
>  
>  
> Exception
>  
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 2, 192.168.0.103, executor driver): 
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
> at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:133)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
> The reason behind this initBatch is not getting the schema that is needed to 
> find out the column value in OrcFileFormat.scala
>  
> {code:java}
> batchReader.initBatch(
>  TypeDescription.fromString(resultSchemaString){code}
>  
> Query is working if 
> {code:java}
> val u = """select * from date_dim limit 5"""{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Assigned] (SPARK-32234) Spark sql commands are failing on selecting the orc tables

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32234:


Assignee: (was: Apache Spark)

> Spark sql commands are failing on selecting the  orc tables
> ---
>
> Key: SPARK-32234
> URL: https://issues.apache.org/jira/browse/SPARK-32234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Spark sql commands are failing on selecting the orc tables
> Steps to reproduce
>  
> {code:java}
> val table = """CREATE TABLE `date_dim` (
>   `d_date_sk` INT,
>   `d_date_id` STRING,
>   `d_date` TIMESTAMP,
>   `d_month_seq` INT,
>   `d_week_seq` INT,
>   `d_quarter_seq` INT,
>   `d_year` INT,
>   `d_dow` INT,
>   `d_moy` INT,
>   `d_dom` INT,
>   `d_qoy` INT,
>   `d_fy_year` INT,
>   `d_fy_quarter_seq` INT,
>   `d_fy_week_seq` INT,
>   `d_day_name` STRING,
>   `d_quarter_name` STRING,
>   `d_holiday` STRING,
>   `d_weekend` STRING,
>   `d_following_holiday` STRING,
>   `d_first_dom` INT,
>   `d_last_dom` INT,
>   `d_same_day_ly` INT,
>   `d_same_day_lq` INT,
>   `d_current_day` STRING,
>   `d_current_week` STRING,
>   `d_current_month` STRING,
>   `d_current_quarter` STRING,
>   `d_current_year` STRING)
> USING orc
> LOCATION '/Users/test/tpcds_scale5data/date_dim'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1574682806')"""
> spark.sql(table).collect
> val u = """select date_dim.d_date_id from date_dim limit 5"""
> spark.sql(u).collect
> {code}
>  
>  
> Exception
>  
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 2, 192.168.0.103, executor driver): 
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
> at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:133)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
> The reason behind this initBatch is not getting the schema that is needed to 
> find out the column value in OrcFileFormat.scala
>  
> {code:java}
> batchReader.initBatch(
>  TypeDescription.fromString(resultSchemaString){code}
>  
> Query is working if 
> {code:java}
> val u = """select * from date_dim limit 5"""{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32234) Spark sql commands are failing on selecting the orc tables

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153954#comment-17153954
 ] 

Apache Spark commented on SPARK-32234:
--

User 'SaurabhChawla100' has created a pull request for this issue:
https://github.com/apache/spark/pull/29045

> Spark sql commands are failing on selecting the  orc tables
> ---
>
> Key: SPARK-32234
> URL: https://issues.apache.org/jira/browse/SPARK-32234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Saurabh Chawla
>Priority: Major
>
> Spark sql commands are failing on selecting the orc tables
> Steps to reproduce
>  
> {code:java}
> val table = """CREATE TABLE `date_dim` (
>   `d_date_sk` INT,
>   `d_date_id` STRING,
>   `d_date` TIMESTAMP,
>   `d_month_seq` INT,
>   `d_week_seq` INT,
>   `d_quarter_seq` INT,
>   `d_year` INT,
>   `d_dow` INT,
>   `d_moy` INT,
>   `d_dom` INT,
>   `d_qoy` INT,
>   `d_fy_year` INT,
>   `d_fy_quarter_seq` INT,
>   `d_fy_week_seq` INT,
>   `d_day_name` STRING,
>   `d_quarter_name` STRING,
>   `d_holiday` STRING,
>   `d_weekend` STRING,
>   `d_following_holiday` STRING,
>   `d_first_dom` INT,
>   `d_last_dom` INT,
>   `d_same_day_ly` INT,
>   `d_same_day_lq` INT,
>   `d_current_day` STRING,
>   `d_current_week` STRING,
>   `d_current_month` STRING,
>   `d_current_quarter` STRING,
>   `d_current_year` STRING)
> USING orc
> LOCATION '/Users/test/tpcds_scale5data/date_dim'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1574682806')"""
> spark.sql(table).collect
> val u = """select date_dim.d_date_id from date_dim limit 5"""
> spark.sql(u).collect
> {code}
>  
>  
> Exception
>  
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 2, 192.168.0.103, executor driver): 
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
> at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:133)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
> The reason behind this initBatch is not getting the schema that is needed to 
> find out the column value in OrcFileFormat.scala
>  
> {code:java}
> batchReader.initBatch(
>  TypeDescription.fromString(resultSchemaString){code}
>  
> Query is working if 
> {code:java}
> val u = """select * from date_dim limit 5"""{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Assigned] (SPARK-32234) Spark sql commands are failing on selecting the orc tables

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32234:


Assignee: Apache Spark

> Spark sql commands are failing on selecting the  orc tables
> ---
>
> Key: SPARK-32234
> URL: https://issues.apache.org/jira/browse/SPARK-32234
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Saurabh Chawla
>Assignee: Apache Spark
>Priority: Major
>
> Spark sql commands are failing on selecting the orc tables
> Steps to reproduce
>  
> {code:java}
> val table = """CREATE TABLE `date_dim` (
>   `d_date_sk` INT,
>   `d_date_id` STRING,
>   `d_date` TIMESTAMP,
>   `d_month_seq` INT,
>   `d_week_seq` INT,
>   `d_quarter_seq` INT,
>   `d_year` INT,
>   `d_dow` INT,
>   `d_moy` INT,
>   `d_dom` INT,
>   `d_qoy` INT,
>   `d_fy_year` INT,
>   `d_fy_quarter_seq` INT,
>   `d_fy_week_seq` INT,
>   `d_day_name` STRING,
>   `d_quarter_name` STRING,
>   `d_holiday` STRING,
>   `d_weekend` STRING,
>   `d_following_holiday` STRING,
>   `d_first_dom` INT,
>   `d_last_dom` INT,
>   `d_same_day_ly` INT,
>   `d_same_day_lq` INT,
>   `d_current_day` STRING,
>   `d_current_week` STRING,
>   `d_current_month` STRING,
>   `d_current_quarter` STRING,
>   `d_current_year` STRING)
> USING orc
> LOCATION '/Users/test/tpcds_scale5data/date_dim'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1574682806')"""
> spark.sql(table).collect
> val u = """select date_dim.d_date_id from date_dim limit 5"""
> spark.sql(u).collect
> {code}
>  
>  
> Exception
>  
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 2, 192.168.0.103, executor driver): 
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
> at 
> org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
> at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:133)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  
> The reason behind this initBatch is not getting the schema that is needed to 
> find out the column value in OrcFileFormat.scala
>  
> {code:java}
> batchReader.initBatch(
>  TypeDescription.fromString(resultSchemaString){code}
>  
> Query is working if 
> {code:java}
> val u = """select * from date_dim limit 5"""{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Created] (SPARK-32234) Spark sql commands are failing on selecting the orc tables

2020-07-08 Thread Saurabh Chawla (Jira)
Saurabh Chawla created SPARK-32234:
--

 Summary: Spark sql commands are failing on selecting the  orc 
tables
 Key: SPARK-32234
 URL: https://issues.apache.org/jira/browse/SPARK-32234
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Saurabh Chawla


Spark sql commands are failing on selecting the orc tables

Steps to reproduce

 
{code:java}
val table = """CREATE TABLE `date_dim` (
  `d_date_sk` INT,
  `d_date_id` STRING,
  `d_date` TIMESTAMP,
  `d_month_seq` INT,
  `d_week_seq` INT,
  `d_quarter_seq` INT,
  `d_year` INT,
  `d_dow` INT,
  `d_moy` INT,
  `d_dom` INT,
  `d_qoy` INT,
  `d_fy_year` INT,
  `d_fy_quarter_seq` INT,
  `d_fy_week_seq` INT,
  `d_day_name` STRING,
  `d_quarter_name` STRING,
  `d_holiday` STRING,
  `d_weekend` STRING,
  `d_following_holiday` STRING,
  `d_first_dom` INT,
  `d_last_dom` INT,
  `d_same_day_ly` INT,
  `d_same_day_lq` INT,
  `d_current_day` STRING,
  `d_current_week` STRING,
  `d_current_month` STRING,
  `d_current_quarter` STRING,
  `d_current_year` STRING)
USING orc
LOCATION '/Users/test/tpcds_scale5data/date_dim'
TBLPROPERTIES (
  'transient_lastDdlTime' = '1574682806')"""

spark.sql(table).collect

val u = """select date_dim.d_date_id from date_dim limit 5"""

spark.sql(u).collect
{code}
 

 

Exception

 
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 
2, 192.168.0.103, executor driver): java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156)
at 
org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:141)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:203)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:620)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:343)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:372)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:336)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:133)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:445)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:448)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

{code}
 

 

The reason behind this initBatch is not getting the schema that is needed to 
find out the column value in OrcFileFormat.scala

 
{code:java}
batchReader.initBatch(
 TypeDescription.fromString(resultSchemaString){code}
 

Query is working if 
{code:java}
val u = """select * from date_dim limit 5"""{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32174) toPandas attempted Arrow optimization but has reached an error and can not continue

2020-07-08 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-32174.
--
Resolution: Not A Problem

Great, I will mark this as resolved then.  We should add the configuration 
example you used to the docs somewhere as well since I'm sure others will hit 
this.

> toPandas attempted Arrow optimization but has reached an error and can not 
> continue
> ---
>
> Key: SPARK-32174
> URL: https://issues.apache.org/jira/browse/SPARK-32174
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, PySpark
>Affects Versions: 3.0.0
> Environment: Spark 3.0.0, running in *stand-alone* mode
>Reporter: Ramin Hazegh
>Priority: Major
>
> h4. Converting a dataframe to Panda data frame using toPandas() fails.
>  
> *Spark 3.0.0 Running in stand-alone mode* using docker containers based on 
> jupyter docker stack here:
> [https://github.com/jupyter/docker-stacks/blob/master/pyspark-notebook/Dockerfile]
>  
> $ conda list | grep arrow
>  *arrow-cpp 0.17.1* py38h1234567_5_cpu conda-forge
>  *pyarrow 0.17.1* py38h1234567_5_cpu conda-forge
> $ conda list | grep pandas
>  *pandas 1.0.5* py38hcb8c335_0 conda-forge
>  
> *To reproduce:*
> {code:java}
> import numpy as np
> import pandas as pd
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.master("spark://10.0.1.40:7077") \
> .config("spark.sql.execution.arrow.enabled", "true") \
> .appName('test_arrow') \
> .getOrCreate()
> 
> # Generate a pandas DataFrame
> pdf = pd.DataFrame(np.random.rand(100, 3))
> # Create a Spark DataFrame from a pandas DataFrame using Arrow
> df = spark.createDataFrame(pdf)
> # Convert the Spark DataFrame back to a pandas DataFrame using Arrow
> result_pdf = df.select("*").toPandas()
> {code}
>  
> ==
> /usr/local/spark/python/pyspark/sql/pandas/conversion.py:134: UserWarning: 
> toPandas attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached 
> the error below and can not continue. Note that 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect 
> on failures in the middle of computation.
>  An error occurred while calling o55.getResult.
>  : org.apache.spark.SparkException: Exception thrown in awaitResult: 
>  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
>  at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:88)
>  at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:84)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>  at py4j.Gateway.invoke(Gateway.java:282)
>  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>  at py4j.commands.CallCommand.execute(CallCommand.java:79)
>  at py4j.GatewayConnection.run(GatewayConnection.java:238)
>  at java.base/java.lang.Thread.run(Thread.java:834)
>  Caused by: org.apache.spark.SparkException: Job aborted due to stage 
> failure: Task 14 in stage 0.0 failed 4 times, most recent failure: Lost task 
> 14.3 in stage 0.0 (TID 31, 10.0.1.43, executor 0): 
> java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
> java.nio.DirectByteBuffer.(long, int) not available
>  at 
> io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:490)
>  at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
>  at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
>  at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
>  at org.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
>  at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
>  at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.deserializeRecordBatch(MessageSerializer.java:344)
>  at 
> org.apache.spark.sql.execution.arrow.ArrowConverters$.loadBatch(ArrowConverters.scala:189)
>  at 
> org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$2.nextBatch(ArrowConverters.scala:165)
>  at 
> org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$2.(ArrowConverters.scala:144)
>  at 
> org.apache.spark.sql.execution.arrow.ArrowConverters$.fromBatchIterator(ArrowConverters.scala:143)
>  at 
> 

[jira] [Commented] (SPARK-32232) IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver given invalid value auto

2020-07-08 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153769#comment-17153769
 ] 

Sean R. Owen commented on SPARK-32232:
--

A few more notes: the equivalent code in Scala works fine, so it's related to 
the Pyspark wrappers.
The serialized model looks fine even in this case; it records a default solver 
param of 'l-bfgs' and no overridden value.
When loaded back, the solver is 'auto', the default in HasSolver. I think 
something in MultilayerPerceptronParams doesn't quite override this as 
intended. It does have a solver member but not sure if something else is 
missing.

> IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver 
> given invalid value auto
> --
>
> Key: SPARK-32232
> URL: https://issues.apache.org/jira/browse/SPARK-32232
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: steven taylor
>Priority: Major
>
> I believe I have discovered a bug when loading 
> MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I 
> have tested and can see is not there in at least Spark 2.4.3, Scala 2.11.  
> (I'm not sure if the Scala version is important).
>  
> I am using pyspark on a databricks cluster and importing the library  "from 
> pyspark.ml.classification import MultilayerPerceptronClassificationModel"
>  
> When running model=MultilayerPerceptronClassificationModel.("load") and then 
> model. transform (df) I get the following error: IllegalArgumentException: 
> MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid 
> value auto.
>  
>  
> This issue can be easily replicated by running the example given on the spark 
> documents: 
> [http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier]
>  
> Then adding a save model, load model and transform statement as such:
>  
> *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier
> *from* *pyspark.ml.evaluation* *import* MulticlassClassificationEvaluator
>  
> _# Load training data_
> data = spark.read.format("libsvm")\
>     .load("data/mllib/sample_multiclass_classification_data.txt")
>  
> _# Split the data into train and test_
> splits = data.randomSplit([0.6, 0.4], 1234)
> train = splits[0]
> test = splits[1]
>  
> _# specify layers for the neural network:_
> _# input layer of size 4 (features), two intermediate of size 5 and 4_
> _# and output of size 3 (classes)_
> layers = [4, 5, 4, 3]
>  
> _# create the trainer and set its parameters_
> trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, 
> blockSize=128, seed=1234)
>  
> _# train the model_
> model = trainer.fit(train)
>  
> _# compute accuracy on the test set_
> result = model.transform(test)
> predictionAndLabels = result.select("prediction", "label")
> evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
> *print*("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels)))
>  
> *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier, 
> MultilayerPerceptronClassificationModel
> model.save(Save_location)
> model2. MultilayerPerceptronClassificationModel.load(Save_location)
>  
> result_from_loaded = model2.transform(test)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32174) toPandas attempted Arrow optimization but has reached an error and can not continue

2020-07-08 Thread Ramin Hazegh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153756#comment-17153756
 ] 

Ramin Hazegh commented on SPARK-32174:
--

Thanks for the hint. Setting {{io.netty.tryReflectionSetAccessible}} to true 
seems to have resolved the issue.

For the record, I set the options for both the *driver* and the *executors*:
{code:python}
spark = SparkSession.builder.master("spark://10.0.1.40:7077") \
.config("spark.driver.extraJavaOptions", 
"-Dio.netty.tryReflectionSetAccessible=true") \
.config("spark.executor.extraJavaOptions", 
"-Dio.netty.tryReflectionSetAccessible=true") \
.config("spark.sql.execution.arrow.enabled", "true") \
.appName('test_arrow') \
.getOrCreate()
{code}

> toPandas attempted Arrow optimization but has reached an error and can not 
> continue
> ---
>
> Key: SPARK-32174
> URL: https://issues.apache.org/jira/browse/SPARK-32174
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, PySpark
>Affects Versions: 3.0.0
> Environment: Spark 3.0.0, running in *stand-alone* mode
>Reporter: Ramin Hazegh
>Priority: Major
>
> h4. Converting a dataframe to Panda data frame using toPandas() fails.
>  
> *Spark 3.0.0 Running in stand-alone mode* using docker containers based on 
> jupyter docker stack here:
> [https://github.com/jupyter/docker-stacks/blob/master/pyspark-notebook/Dockerfile]
>  
> $ conda list | grep arrow
>  *arrow-cpp 0.17.1* py38h1234567_5_cpu conda-forge
>  *pyarrow 0.17.1* py38h1234567_5_cpu conda-forge
> $ conda list | grep pandas
>  *pandas 1.0.5* py38hcb8c335_0 conda-forge
>  
> *To reproduce:*
> {code:java}
> import numpy as np
> import pandas as pd
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.master("spark://10.0.1.40:7077") \
> .config("spark.sql.execution.arrow.enabled", "true") \
> .appName('test_arrow') \
> .getOrCreate()
> 
> # Generate a pandas DataFrame
> pdf = pd.DataFrame(np.random.rand(100, 3))
> # Create a Spark DataFrame from a pandas DataFrame using Arrow
> df = spark.createDataFrame(pdf)
> # Convert the Spark DataFrame back to a pandas DataFrame using Arrow
> result_pdf = df.select("*").toPandas()
> {code}
>  
> ==
> /usr/local/spark/python/pyspark/sql/pandas/conversion.py:134: UserWarning: 
> toPandas attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached 
> the error below and can not continue. Note that 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect 
> on failures in the middle of computation.
>  An error occurred while calling o55.getResult.
>  : org.apache.spark.SparkException: Exception thrown in awaitResult: 
>  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
>  at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:88)
>  at 
> org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:84)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>  at py4j.Gateway.invoke(Gateway.java:282)
>  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>  at py4j.commands.CallCommand.execute(CallCommand.java:79)
>  at py4j.GatewayConnection.run(GatewayConnection.java:238)
>  at java.base/java.lang.Thread.run(Thread.java:834)
>  Caused by: org.apache.spark.SparkException: Job aborted due to stage 
> failure: Task 14 in stage 0.0 failed 4 times, most recent failure: Lost task 
> 14.3 in stage 0.0 (TID 31, 10.0.1.43, executor 0): 
> java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
> java.nio.DirectByteBuffer.(long, int) not available
>  at 
> io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:490)
>  at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
>  at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
>  at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
>  at org.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
>  at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
>  at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.deserializeRecordBatch(MessageSerializer.java:344)
>  at 
> 

[jira] [Commented] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153734#comment-17153734
 ] 

Apache Spark commented on SPARK-32227:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/29044

> Bug in load-spark-env.cmd  with Spark 3.0.0
> ---
>
> Key: SPARK-32227
> URL: https://issues.apache.org/jira/browse/SPARK-32227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
> Environment: Windows 10
>Reporter: Ihor Bobak
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: load-spark-env.cmd
>
>
> spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.
>  
> *How to reproduce:*
> 1) download spark 3.0.0 without hadoop and extract it
> 2) put a file conf/spark-env.cmd with the following contents (paths are 
> relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need 
> to change):
>  
> SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
>  SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
>  SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
>  SET 
> SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*
>  
> 3) go to the bin directory and run pyspark.   You will get an error that 
> log4j can't be found, etc. (reason: the environment was not loaded indeed, it 
> doesn't see where hadoop with all its jars is).
>  
> *How to fix:*
> just take the load-spark-env.cmd  from Spark version 2.4.3, and everything 
> will work.
> [UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.
>  
> *What is the difference?*
> I am not a good specialist in Windows batch, but doing a function
> :LoadSparkEnv
>  if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
>   call "%SPARK_CONF_DIR%\spark-env.cmd"
>  )
> and then calling it (as it was in 2.4.3) helps.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32227:


Assignee: (was: Apache Spark)

> Bug in load-spark-env.cmd  with Spark 3.0.0
> ---
>
> Key: SPARK-32227
> URL: https://issues.apache.org/jira/browse/SPARK-32227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
> Environment: Windows 10
>Reporter: Ihor Bobak
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: load-spark-env.cmd
>
>
> spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.
>  
> *How to reproduce:*
> 1) download spark 3.0.0 without hadoop and extract it
> 2) put a file conf/spark-env.cmd with the following contents (paths are 
> relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need 
> to change):
>  
> SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
>  SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
>  SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
>  SET 
> SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*
>  
> 3) go to the bin directory and run pyspark.   You will get an error that 
> log4j can't be found, etc. (reason: the environment was not loaded indeed, it 
> doesn't see where hadoop with all its jars is).
>  
> *How to fix:*
> just take the load-spark-env.cmd  from Spark version 2.4.3, and everything 
> will work.
> [UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.
>  
> *What is the difference?*
> I am not a good specialist in Windows batch, but doing a function
> :LoadSparkEnv
>  if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
>   call "%SPARK_CONF_DIR%\spark-env.cmd"
>  )
> and then calling it (as it was in 2.4.3) helps.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153732#comment-17153732
 ] 

Apache Spark commented on SPARK-32227:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/29044

> Bug in load-spark-env.cmd  with Spark 3.0.0
> ---
>
> Key: SPARK-32227
> URL: https://issues.apache.org/jira/browse/SPARK-32227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
> Environment: Windows 10
>Reporter: Ihor Bobak
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: load-spark-env.cmd
>
>
> spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.
>  
> *How to reproduce:*
> 1) download spark 3.0.0 without hadoop and extract it
> 2) put a file conf/spark-env.cmd with the following contents (paths are 
> relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need 
> to change):
>  
> SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
>  SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
>  SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
>  SET 
> SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*
>  
> 3) go to the bin directory and run pyspark.   You will get an error that 
> log4j can't be found, etc. (reason: the environment was not loaded indeed, it 
> doesn't see where hadoop with all its jars is).
>  
> *How to fix:*
> just take the load-spark-env.cmd  from Spark version 2.4.3, and everything 
> will work.
> [UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.
>  
> *What is the difference?*
> I am not a good specialist in Windows batch, but doing a function
> :LoadSparkEnv
>  if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
>   call "%SPARK_CONF_DIR%\spark-env.cmd"
>  )
> and then calling it (as it was in 2.4.3) helps.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32227:


Assignee: Apache Spark

> Bug in load-spark-env.cmd  with Spark 3.0.0
> ---
>
> Key: SPARK-32227
> URL: https://issues.apache.org/jira/browse/SPARK-32227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
> Environment: Windows 10
>Reporter: Ihor Bobak
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: load-spark-env.cmd
>
>
> spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.
>  
> *How to reproduce:*
> 1) download spark 3.0.0 without hadoop and extract it
> 2) put a file conf/spark-env.cmd with the following contents (paths are 
> relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need 
> to change):
>  
> SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
>  SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
>  SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
>  SET 
> SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*
>  
> 3) go to the bin directory and run pyspark.   You will get an error that 
> log4j can't be found, etc. (reason: the environment was not loaded indeed, it 
> doesn't see where hadoop with all its jars is).
>  
> *How to fix:*
> just take the load-spark-env.cmd  from Spark version 2.4.3, and everything 
> will work.
> [UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.
>  
> *What is the difference?*
> I am not a good specialist in Windows batch, but doing a function
> :LoadSparkEnv
>  if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
>   call "%SPARK_CONF_DIR%\spark-env.cmd"
>  )
> and then calling it (as it was in 2.4.3) helps.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32233) Disable SBT unidoc generation testing in Jenkins

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32233:


Assignee: (was: Apache Spark)

> Disable SBT unidoc generation testing in Jenkins
> 
>
> Key: SPARK-32233
> URL: https://issues.apache.org/jira/browse/SPARK-32233
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32233) Disable SBT unidoc generation testing in Jenkins

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32233:


Assignee: Apache Spark

> Disable SBT unidoc generation testing in Jenkins
> 
>
> Key: SPARK-32233
> URL: https://issues.apache.org/jira/browse/SPARK-32233
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32233) Disable SBT unidoc generation testing in Jenkins

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153719#comment-17153719
 ] 

Apache Spark commented on SPARK-32233:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29017

> Disable SBT unidoc generation testing in Jenkins
> 
>
> Key: SPARK-32233
> URL: https://issues.apache.org/jira/browse/SPARK-32233
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32233) Disable SBT unidoc generation testing in Jenkins

2020-07-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-32233:
-

 Summary: Disable SBT unidoc generation testing in Jenkins
 Key: SPARK-32233
 URL: https://issues.apache.org/jira/browse/SPARK-32233
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32232) IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver given invalid value auto

2020-07-08 Thread steven taylor (Jira)
steven taylor created SPARK-32232:
-

 Summary: IllegalArgumentException: 
MultilayerPerceptronClassifier_... parameter solver given invalid value auto
 Key: SPARK-32232
 URL: https://issues.apache.org/jira/browse/SPARK-32232
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 3.0.0
Reporter: steven taylor


I believe I have discovered a bug when loading 
MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I 
have tested and can see is not there in at least Spark 2.4.3, Scala 2.11.  (I'm 
not sure if the Scala version is important).

 

I am using pyspark on a databricks cluster and importing the library  "from 
pyspark.ml.classification import MultilayerPerceptronClassificationModel"

 

When running model=MultilayerPerceptronClassificationModel.("load") and then 
model. transform (df) I get the following error: IllegalArgumentException: 
MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid 
value auto.

 

 

This issue can be easily replicated by running the example given on the spark 
documents: 
[http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier]

 

Then adding a save model, load model and transform statement as such:

 

*from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier

*from* *pyspark.ml.evaluation* *import* MulticlassClassificationEvaluator

 

_# Load training data_

data = spark.read.format("libsvm")\

    .load("data/mllib/sample_multiclass_classification_data.txt")

 

_# Split the data into train and test_

splits = data.randomSplit([0.6, 0.4], 1234)

train = splits[0]

test = splits[1]

 

_# specify layers for the neural network:_

_# input layer of size 4 (features), two intermediate of size 5 and 4_

_# and output of size 3 (classes)_

layers = [4, 5, 4, 3]

 

_# create the trainer and set its parameters_

trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, 
blockSize=128, seed=1234)

 

_# train the model_

model = trainer.fit(train)

 

_# compute accuracy on the test set_

result = model.transform(test)

predictionAndLabels = result.select("prediction", "label")

evaluator = MulticlassClassificationEvaluator(metricName="accuracy")

*print*("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels)))

 

*from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier, 
MultilayerPerceptronClassificationModel

model.save(Save_location)

model2. MultilayerPerceptronClassificationModel.load(Save_location)

 

result_from_loaded = model2.transform(test)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32205) Writing timestamp in mysql gets fails

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32205:


Assignee: (was: Apache Spark)

> Writing timestamp in mysql gets fails 
> --
>
> Key: SPARK-32205
> URL: https://issues.apache.org/jira/browse/SPARK-32205
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Priority: Major
>
> When we are writing to mysql with TIMESTAMP column it supports only range 
> '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07'. Mysql has DATETIME 
> datatype which has 1000-01-01 00:00:00' to '-12-31 23:59:59' range.
> How to map spark timestamp datatype to mysql datetime datatype in order to 
> use higher supporting range ?
> [https://dev.mysql.com/doc/refman/5.7/en/datetime.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32205) Writing timestamp in mysql gets fails

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32205:


Assignee: Apache Spark

> Writing timestamp in mysql gets fails 
> --
>
> Key: SPARK-32205
> URL: https://issues.apache.org/jira/browse/SPARK-32205
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Assignee: Apache Spark
>Priority: Major
>
> When we are writing to mysql with TIMESTAMP column it supports only range 
> '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07'. Mysql has DATETIME 
> datatype which has 1000-01-01 00:00:00' to '-12-31 23:59:59' range.
> How to map spark timestamp datatype to mysql datetime datatype in order to 
> use higher supporting range ?
> [https://dev.mysql.com/doc/refman/5.7/en/datetime.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32205) Writing timestamp in mysql gets fails

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153712#comment-17153712
 ] 

Apache Spark commented on SPARK-32205:
--

User 'TJX2014' has created a pull request for this issue:
https://github.com/apache/spark/pull/29043

> Writing timestamp in mysql gets fails 
> --
>
> Key: SPARK-32205
> URL: https://issues.apache.org/jira/browse/SPARK-32205
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Priority: Major
>
> When we are writing to mysql with TIMESTAMP column it supports only range 
> '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07'. Mysql has DATETIME 
> datatype which has 1000-01-01 00:00:00' to '-12-31 23:59:59' range.
> How to map spark timestamp datatype to mysql datetime datatype in order to 
> use higher supporting range ?
> [https://dev.mysql.com/doc/refman/5.7/en/datetime.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32231) Use Hadoop 3 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153668#comment-17153668
 ] 

Apache Spark commented on SPARK-32231:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29042

> Use Hadoop 3 profile in AppVeyor SparkR build
> -
>
> Key: SPARK-32231
> URL: https://issues.apache.org/jira/browse/SPARK-32231
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> AppVeyor with Hadoop 3 is failed, see
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> It will be disabled for now at SPARK-32230.
> We should investigate and enable it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32231) Use Hadoop 3 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32231:


Assignee: Apache Spark

> Use Hadoop 3 profile in AppVeyor SparkR build
> -
>
> Key: SPARK-32231
> URL: https://issues.apache.org/jira/browse/SPARK-32231
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> AppVeyor with Hadoop 3 is failed, see
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> It will be disabled for now at SPARK-32230.
> We should investigate and enable it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32231) Use Hadoop 3 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32231:


Assignee: Apache Spark

> Use Hadoop 3 profile in AppVeyor SparkR build
> -
>
> Key: SPARK-32231
> URL: https://issues.apache.org/jira/browse/SPARK-32231
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> AppVeyor with Hadoop 3 is failed, see
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> It will be disabled for now at SPARK-32230.
> We should investigate and enable it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32231) Use Hadoop 3 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32231:


Assignee: (was: Apache Spark)

> Use Hadoop 3 profile in AppVeyor SparkR build
> -
>
> Key: SPARK-32231
> URL: https://issues.apache.org/jira/browse/SPARK-32231
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> AppVeyor with Hadoop 3 is failed, see
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> It will be disabled for now at SPARK-32230.
> We should investigate and enable it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32230) Use Hadoop 2.7 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153637#comment-17153637
 ] 

Apache Spark commented on SPARK-32230:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29040

> Use Hadoop 2.7 profile in AppVeyor SparkR build 
> 
>
> Key: SPARK-32230
> URL: https://issues.apache.org/jira/browse/SPARK-32230
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Hadoop 3 is used by default as of SPARK-32058 but AppVeyor seems failing:
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> We should use Hadoop 2 in AppVeyor build for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32231) Use Hadoop 3 profile in AppVeyor SparkR build

2020-07-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-32231:


 Summary: Use Hadoop 3 profile in AppVeyor SparkR build
 Key: SPARK-32231
 URL: https://issues.apache.org/jira/browse/SPARK-32231
 Project: Spark
  Issue Type: Test
  Components: Project Infra, R
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


AppVeyor with Hadoop 3 is failed, see

https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845

It will be disabled for now at SPARK-32230.

We should investigate and enable it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32230) Use Hadoop 2.7 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32230:


Assignee: Apache Spark

> Use Hadoop 2.7 profile in AppVeyor SparkR build 
> 
>
> Key: SPARK-32230
> URL: https://issues.apache.org/jira/browse/SPARK-32230
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Hadoop 3 is used by default as of SPARK-32058 but AppVeyor seems failing:
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> We should use Hadoop 2 in AppVeyor build for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32230) Use Hadoop 2.7 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32230:


Assignee: (was: Apache Spark)

> Use Hadoop 2.7 profile in AppVeyor SparkR build 
> 
>
> Key: SPARK-32230
> URL: https://issues.apache.org/jira/browse/SPARK-32230
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Hadoop 3 is used by default as of SPARK-32058 but AppVeyor seems failing:
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> We should use Hadoop 2 in AppVeyor build for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32230) Use Hadoop 2.7 profile in AppVeyor SparkR build

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153636#comment-17153636
 ] 

Apache Spark commented on SPARK-32230:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29040

> Use Hadoop 2.7 profile in AppVeyor SparkR build 
> 
>
> Key: SPARK-32230
> URL: https://issues.apache.org/jira/browse/SPARK-32230
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, R
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Hadoop 3 is used by default as of SPARK-32058 but AppVeyor seems failing:
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845
> We should use Hadoop 2 in AppVeyor build for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22865) Publish Official Apache Spark Docker images

2020-07-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153635#comment-17153635
 ] 

Maciej Bryński commented on SPARK-22865:


Almost 3 years later there is no official docker image.

Why ?

> Publish Official Apache Spark Docker images
> ---
>
> Key: SPARK-22865
> URL: https://issues.apache.org/jira/browse/SPARK-22865
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32230) Use Hadoop 2.7 profile in AppVeyor SparkR build

2020-07-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-32230:


 Summary: Use Hadoop 2.7 profile in AppVeyor SparkR build 
 Key: SPARK-32230
 URL: https://issues.apache.org/jira/browse/SPARK-32230
 Project: Spark
  Issue Type: Test
  Components: Project Infra, R
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


Hadoop 3 is used by default as of SPARK-32058 but AppVeyor seems failing:

https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845

We should use Hadoop 2 in AppVeyor build for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20680) Spark-sql do not support for void column datatype of view

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153623#comment-17153623
 ] 

Apache Spark commented on SPARK-20680:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29041

> Spark-sql do not support for void column datatype of view
> -
>
> Key: SPARK-20680
> URL: https://issues.apache.org/jira/browse/SPARK-20680
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.4.6, 3.0.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.1.0
>
>
> Create a HIVE view:
> {quote}
> hive> create table bad as select 1 x, null z from dual;
> {quote}
> Because there's no type, Hive gives it the VOID type:
> {quote}
> hive> describe bad;
> OK
> x int 
> z void
> {quote}
> In Spark2.0.x, the behaviour to read this view is normal:
> {quote}
> spark-sql> describe bad;
> x   int NULL
> z   voidNULL
> Time taken: 4.431 seconds, Fetched 2 row(s)
> {quote}
> But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type 
> string: void
> {quote}
> spark-sql> describe bad;
> 17/05/09 03:12:08 INFO execution.SparkSqlParser: Parsing command: describe bad
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: int
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: void
> 17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361)
> Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
> DataType void() is not supported.(line 1, pos 0)
> == SQL ==  
> void   
> ^^^
> ... 61 more
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-32024.
--
Fix Version/s: 3.1.0
   2.4.7
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 28859
[https://github.com/apache/spark/pull/28859]

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 3.0.1, 2.4.7, 3.1.0
>
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> 

[jira] [Assigned] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-07-08 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-32024:


Assignee: Zhen Li

> Disk usage tracker went negative in HistoryServerDiskManager
> 
>
> Key: SPARK-32024
> URL: https://issues.apache.org/jira/browse/SPARK-32024
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
> Environment: System: Windows, Linux.
> Config:
> spark.history.retainedApplications 200
> spark.history.store.maxDiskUsage 10g
> spark.history.store.path /cache_hs
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
>
> After restart history server, we would see below error randomly.
> h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)
> ||URI:|/history//*/stages/|
> ||STATUS:|500|
> ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
> (now = -, delta = -)|
> ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
> ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went 
> negative (now = -, delta = -)|
> h3. Caused by:
> java.lang.IllegalStateException: Disk usage tracker went negative (now = 
> -633925, delta = -38947) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>  at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1278)
>  at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> 

[jira] [Commented] (SPARK-32120) Single GPU is allocated multiple times

2020-07-08 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153579#comment-17153579
 ] 

wuyi commented on SPARK-32120:
--

oh yeah, we've documented it. Thanks for reminding.

> Single GPU is allocated multiple times
> --
>
> Key: SPARK-32120
> URL: https://issues.apache.org/jira/browse/SPARK-32120
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Enrico Minack
>Priority: Major
> Attachments: screenshot-2.png, screenshot-3.png
>
>
> I am running Spark in a {{local-cluster[2,1,1024]}} with one GPU per worker, 
> task and executor, and two GPUs provided through a GPU discovery script. The 
> same GPU is allocated to both executors.
> Discovery script output:
> {code:java}
> {"name": "gpu", "addresses": ["0", "1"]}
> {code}
> Spark local cluster setup through {{spark-shell}}:
> {code:java}
> ./spark-3.0.0-bin-hadoop2.7/bin/spark-shell --master 
> "local-cluster[2,1,1024]" --conf 
> spark.worker.resource.gpu.discoveryScript=/tmp/gpu.json --conf 
> spark.worker.resource.gpu.amount=1 --conf spark.task.resource.gpu.amount=1 
> --conf spark.executor.resource.gpu.amount=1
> {code}
> Executor page of this cluster:
>  !screenshot-2.png!
> You can see that both executors have the same GPU allocated: {{[1]}}
> Code run in the Spark shell:
> {code:java}
> scala> import org.apache.spark.TaskContext
> import org.apache.spark.TaskContext
> scala> def fn(it: Iterator[java.lang.Long]): Iterator[(String, (String, 
> Array[String]))] = { TaskContext.get().resources().mapValues(v => (v.name, 
> v.addresses)).iterator }
> fn: (it: Iterator[Long])Iterator[(String, (String, Array[String]))]
> scala> spark.range(0,2,1,2).mapPartitions(fn).collect
> res0: Array[(String, (String, Array[String]))] = Array((gpu,(gpu,Array(1))), 
> (gpu,(gpu,Array(1
> {code}
> The result shows that each task got GPU {{1}}. The executor page shows that 
> each task has been run on different executors (see above screenshot).
> The expected behaviour would have been to have GPU {{0}} assigned to one 
> executor and GPU {{1}} to the other executor. Consequently, each partition / 
> task should then see a different GPU.
> With Spark 3.0.0-preview2 the allocation was as expected (identical code and 
> Spark shell setup):
> {code:java}
> res0: Array[(String, (String, Array[String]))] = Array((gpu,(gpu,Array(0))), 
> (gpu,(gpu,Array(1
> {code}
> !screenshot-3.png!
> Happy to contribute a patch if this is an accepted bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32229) Application entry parsing fails because DriverWrapper registered instead of the normal driver

2020-07-08 Thread Gabor Somogyi (Jira)
Gabor Somogyi created SPARK-32229:
-

 Summary: Application entry parsing fails because DriverWrapper 
registered instead of the normal driver
 Key: SPARK-32229
 URL: https://issues.apache.org/jira/browse/SPARK-32229
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gabor Somogyi


In some cases DriverWrapper registered by DriverRegistry which causes exception 
in PostgresConnectionProvider:
https://github.com/apache/spark/blob/371b35d2e0ab08ebd853147c6673de3adfad0553/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala#L53




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32228) Partition column of hive table was capitalized while stored on HDFS

2020-07-08 Thread Kernel Force (Jira)
Kernel Force created SPARK-32228:


 Summary: Partition column of hive table was capitalized while 
stored on HDFS
 Key: SPARK-32228
 URL: https://issues.apache.org/jira/browse/SPARK-32228
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
 Environment: Hadoop 2.7.7
Hive 2.3.6
Spark 3.0.0
Reporter: Kernel Force


Suppose we have a target hive table to be insert by spark with dynamic 
partition feature on.
{code:sql}
CREATE TABLE DEMO_PART (
ID VARCHAR(10),
NAME VARCHAR(10)
) PARTITIONED BY (BATCH DATE, TEAM VARCHAR(10))
STORED AS ORC;
{code}
And have a source data table like:
{code:sql}
0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T;
+---+-+-+-+
| t.id  | t.name  |   t.batch   | t.team  |
+---+-+-+-+
| 1 | mike| 2020-07-08  | A   |
| 2 | john| 2020-07-07  | B   |
+---+-+-+-+
2 rows selected (0.177 seconds)
{code}
Then doing join operation against an exploded view and insert the result into 
DEMO_PART table:
{code:sql}
sql("""
WITH VA AS (
SELECT ARRAY_REPEAT(1,10) A
),
VB AS (
SELECT EXPLODE(T.A) IDX FROM VA T
),
VC AS (
SELECT ROW_NUMBER() OVER(ORDER BY NULL) RN FROM VB T
),
VD AS (
SELECT T.RN, DATE_ADD(TO_DATE('2020-07-01','-MM-dd'),T.RN) DT FROM VC T
),
VE AS (
SELECT T.DT BATCH, T.RN ID, CASE WHEN T.RN > 5 THEN 'A' ELSE 'B' END TEAM FROM 
VD T
)
SELECT T.BATCH BATCH, S.ID ID, S.NAME NAME, S.TEAM TEAM FROM VE T 
INNER JOIN DEMO_DATA S
ON T.TEAM = S.TEAM
""").
selectExpr(spark.table("DEMO_PART").columns:_*).
write.mode("overwrite").insertInto("DEMO_PART")
{code}
The result could NOT be read by hive beeline:
{code:sql}
0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_PART T;
+---+-+--+-+
| t.id  | t.name  | t.batch  | t.team  |
+---+-+--+-+
+---+-+--+-+
No rows selected (0.268 seconds)
{code}
Because the underlying data stored in HDFS was uncorrect:
{code:bash}
[user@HOSTNAME ~]$ dfs -ls /user/hive/warehouse/demo_part/  
  
Found 21 items
/user/hive/warehouse/demo_part/BATCH=2020-07-02
/user/hive/warehouse/demo_part/BATCH=2020-07-03
/user/hive/warehouse/demo_part/BATCH=2020-07-04
/user/hive/warehouse/demo_part/BATCH=2020-07-05
/user/hive/warehouse/demo_part/BATCH=2020-07-06
/user/hive/warehouse/demo_part/BATCH=2020-07-07
/user/hive/warehouse/demo_part/BATCH=2020-07-08
/user/hive/warehouse/demo_part/BATCH=2020-07-09
/user/hive/warehouse/demo_part/BATCH=2020-07-10
/user/hive/warehouse/demo_part/BATCH=2020-07-11
/user/hive/warehouse/demo_part/_SUCCESS
/user/hive/warehouse/demo_part/batch=2020-07-02
/user/hive/warehouse/demo_part/batch=2020-07-03
/user/hive/warehouse/demo_part/batch=2020-07-04
/user/hive/warehouse/demo_part/batch=2020-07-05
/user/hive/warehouse/demo_part/batch=2020-07-06
/user/hive/warehouse/demo_part/batch=2020-07-07
/user/hive/warehouse/demo_part/batch=2020-07-08
/user/hive/warehouse/demo_part/batch=2020-07-09
/user/hive/warehouse/demo_part/batch=2020-07-10
/user/hive/warehouse/demo_part/batch=2020-07-11
{code}
Both "BATCH=" and "batch=" directories appeared, and the data files was 
stored in "BATCH" directories but not "batch"

The result will be correct if I change the SQL statement, simply change the 
column alias to lower case in the last select, like:
{code:sql}
SELECT T.BATCH batch, S.ID id, S.NAME name, S.TEAM team FROM VE T 
INNER JOIN DEMO_DATA S
ON T.TEAM = S.TEAM
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Ihor Bobak (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ihor Bobak updated SPARK-32227:
---
Description: 
spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.

 

*How to reproduce:*

1) download spark 3.0.0 without hadoop and extract it

2) put a file conf/spark-env.cmd with the following contents (paths are 
relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to 
change):

 

SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
 SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
 SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
 SET 
SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*

 

3) go to the bin directory and run pyspark.   You will get an error that some 
class can't be changed.

 

*How to fix:*

just take the load-spark-env.cmd  from Spark version 2.4.3, and everything will 
work.

[UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.

 

*What is the difference?*

I am not a good specialist in Windows batch, but doing a function

:LoadSparkEnv
 if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
  call "%SPARK_CONF_DIR%\spark-env.cmd"
 )

and then calling it (as it was in 2.4.3) helps.

 

 

  was:
spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.

 

*How to reproduce:*

1) download spark 3.0.0 without hadoop and extract it

2) put a file conf/spark-env.cmd with the following contents (paths are 
relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to 
change):

 

SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
SET 
SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\*

 

3) go to the bin directory and run pyspark.   You will get an error that some 
class can't be changed.

 

*How to fix:*

just take the load-spark-env.cmd  from Spark version 2.4.3, and everything will 
work.

 

*What is the difference?*

I am not a good specialist in Windows batch, but doing a function

:LoadSparkEnv
if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
 call "%SPARK_CONF_DIR%\spark-env.cmd"
)

and then calling it (as it was in 2.4.3) helps.


> Bug in load-spark-env.cmd  with Spark 3.0.0
> ---
>
> Key: SPARK-32227
> URL: https://issues.apache.org/jira/browse/SPARK-32227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
> Environment: Windows 10
>Reporter: Ihor Bobak
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: load-spark-env.cmd
>
>
> spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.
>  
> *How to reproduce:*
> 1) download spark 3.0.0 without hadoop and extract it
> 2) put a file conf/spark-env.cmd with the following contents (paths are 
> relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need 
> to change):
>  
> SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
>  SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
>  SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
>  SET 
> SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*
>  
> 3) go to the bin directory and run pyspark.   You will get 

[jira] [Updated] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Ihor Bobak (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ihor Bobak updated SPARK-32227:
---
Attachment: load-spark-env.cmd

> Bug in load-spark-env.cmd  with Spark 3.0.0
> ---
>
> Key: SPARK-32227
> URL: https://issues.apache.org/jira/browse/SPARK-32227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
> Environment: Windows 10
>Reporter: Ihor Bobak
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: load-spark-env.cmd
>
>
> spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.
>  
> *How to reproduce:*
> 1) download spark 3.0.0 without hadoop and extract it
> 2) put a file conf/spark-env.cmd with the following contents (paths are 
> relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need 
> to change):
>  
> SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
> SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
> SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
> SET 
> SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\*
>  
> 3) go to the bin directory and run pyspark.   You will get an error that some 
> class can't be changed.
>  
> *How to fix:*
> just take the load-spark-env.cmd  from Spark version 2.4.3, and everything 
> will work.
>  
> *What is the difference?*
> I am not a good specialist in Windows batch, but doing a function
> :LoadSparkEnv
> if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
>  call "%SPARK_CONF_DIR%\spark-env.cmd"
> )
> and then calling it (as it was in 2.4.3) helps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Ihor Bobak (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ihor Bobak updated SPARK-32227:
---
Description: 
spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.

 

*How to reproduce:*

1) download spark 3.0.0 without hadoop and extract it

2) put a file conf/spark-env.cmd with the following contents (paths are 
relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to 
change):

 

SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
 SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
 SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
 SET 
SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*

 

3) go to the bin directory and run pyspark.   You will get an error that log4j 
can't be found, etc. (reason: the environment was not loaded indeed, it doesn't 
see where hadoop with all its jars is).

 

*How to fix:*

just take the load-spark-env.cmd  from Spark version 2.4.3, and everything will 
work.

[UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.

 

*What is the difference?*

I am not a good specialist in Windows batch, but doing a function

:LoadSparkEnv
 if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
  call "%SPARK_CONF_DIR%\spark-env.cmd"
 )

and then calling it (as it was in 2.4.3) helps.

 

 

  was:
spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.

 

*How to reproduce:*

1) download spark 3.0.0 without hadoop and extract it

2) put a file conf/spark-env.cmd with the following contents (paths are 
relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to 
change):

 

SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
 SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
 SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
 SET 
SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce*

 

3) go to the bin directory and run pyspark.   You will get an error that some 
class can't be changed.

 

*How to fix:*

just take the load-spark-env.cmd  from Spark version 2.4.3, and everything will 
work.

[UPDATE]:  I attached a fixed version of load-spark-env.cmd  that works fine.

 

*What is the difference?*

I am not a good specialist in Windows batch, but doing a function

:LoadSparkEnv
 if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
  call "%SPARK_CONF_DIR%\spark-env.cmd"
 )

and then calling it (as it was in 2.4.3) helps.

 

 


> Bug in load-spark-env.cmd  with Spark 3.0.0
> ---
>
> Key: SPARK-32227
> URL: https://issues.apache.org/jira/browse/SPARK-32227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
> Environment: Windows 10
>Reporter: Ihor Bobak
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: load-spark-env.cmd
>
>
> spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.
>  
> *How to reproduce:*
> 1) download spark 3.0.0 without hadoop and extract it
> 2) put a file conf/spark-env.cmd with the following contents (paths are 
> relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need 
> to change):
>  
> SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
>  SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
>  SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
>  SET 
> 

[jira] [Created] (SPARK-32227) Bug in load-spark-env.cmd with Spark 3.0.0

2020-07-08 Thread Ihor Bobak (Jira)
Ihor Bobak created SPARK-32227:
--

 Summary: Bug in load-spark-env.cmd  with Spark 3.0.0
 Key: SPARK-32227
 URL: https://issues.apache.org/jira/browse/SPARK-32227
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 3.0.0
 Environment: Windows 10
Reporter: Ihor Bobak
 Fix For: 3.0.1


spark-env.cmd  which is located in conf  is not loaded by load-spark-env.cmd.

 

*How to reproduce:*

1) download spark 3.0.0 without hadoop and extract it

2) put a file conf/spark-env.cmd with the following contents (paths are 
relative to where my hadoop is - in C:\opt\hadoop\hadoop-3.2.1, you may need to 
change):

 

SET JAVA_HOME=C:\opt\Java\jdk1.8.0_241
SET HADOOP_HOME=C:\opt\hadoop\hadoop-3.2.1
SET HADOOP_CONF_DIR=C:\opt\hadoop\hadoop-3.2.1\conf
SET 
SPARK_DIST_CLASSPATH=C:\opt\hadoop\hadoop-3.2.1\etc\hadoop;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\common\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\hdfs\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\yarn\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\lib\*;C:\opt\hadoop\hadoop-3.2.1\share\hadoop\mapreduce\*

 

3) go to the bin directory and run pyspark.   You will get an error that some 
class can't be changed.

 

*How to fix:*

just take the load-spark-env.cmd  from Spark version 2.4.3, and everything will 
work.

 

*What is the difference?*

I am not a good specialist in Windows batch, but doing a function

:LoadSparkEnv
if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
 call "%SPARK_CONF_DIR%\spark-env.cmd"
)

and then calling it (as it was in 2.4.3) helps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32226) JDBC TimeStamp predicates always append `.0`

2020-07-08 Thread Mathew Wicks (Jira)
Mathew Wicks created SPARK-32226:


 Summary: JDBC TimeStamp predicates always append `.0`
 Key: SPARK-32226
 URL: https://issues.apache.org/jira/browse/SPARK-32226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Mathew Wicks


If you have an Informix column with type `DATETIME YEAR TO SECOND`, Informix 
will not let you pass a filter of the form `2020-01-01 00:00:00.0` (with the 
`.0` at the end).

 

In Spark 3.0.0, our predicate pushdown will alway append this `.0` to the end 
of a TimeStamp column filter, even if you don't specify it:
{code:java}
df.where("col1 > '2020-01-01 00:00:00'")
{code}
 

I think we should only pass the `.XXX` suffix if the user passes it in the 
filter, for example:
{code:java}
df.where("col1 > '2020-01-01 00:00:00.123'")
{code}
 

The relevant Spark class is:
{code:java}
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString
{code}
 
 To aid people searching for this error, here is the error emitted by spark:
{code:java}
Driver stacktrace:
  at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
  at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
  at scala.Option.foreach(Option.scala:407)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:467)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:420)
  at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
  at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3625)
  at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2695)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2695)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2902)
  at org.apache.spark.sql.Dataset.getRows(Dataset.scala:300)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:337)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:824)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:783)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:792)
  ... 47 elided
Caused by: java.sql.SQLException: Extra characters at the end of a datetime or 
interval.
  at com.informix.util.IfxErrMsg.buildExceptionWithMessage(IfxErrMsg.java:416)
  at com.informix.util.IfxErrMsg.buildIsamException(IfxErrMsg.java:401)
  at com.informix.jdbc.IfxSqli.addException(IfxSqli.java:3096)
  at com.informix.jdbc.IfxSqli.receiveError(IfxSqli.java:3368)
  at com.informix.jdbc.IfxSqli.dispatchMsg(IfxSqli.java:2292)
  at com.informix.jdbc.IfxSqli.receiveMessage(IfxSqli.java:2217)
  at com.informix.jdbc.IfxSqli.executePrepare(IfxSqli.java:1213)
  at 
com.informix.jdbc.IfxPreparedStatement.setupExecutePrepare(IfxPreparedStatement.java:245)
  at 
com.informix.jdbc.IfxPreparedStatement.processSQL(IfxPreparedStatement.java:229)
  at 
com.informix.jdbc.IfxPreparedStatement.(IfxPreparedStatement.java:119)
  at 

[jira] [Created] (SPARK-32225) Parquet footer information is read twice

2020-07-08 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created SPARK-32225:


 Summary: Parquet footer information is read twice
 Key: SPARK-32225
 URL: https://issues.apache.org/jira/browse/SPARK-32225
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Rajesh Balamohan
 Attachments: spark_parquet_footer_reads.png

When running queries, spark reads parquet footer information twice. In cloud 
env, this would turn out to be expensive (depending on the jobs, # of splits). 
It would be nice to reuse the footer information already read via 
"ParquetInputFormat::buildReaderWithPartitionValues"

 

!image-2020-07-08-14-24-23-470.png|width=726,height=730!

Lines of interest:

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L271]


[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L326]

 

[https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L105]


[https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L111]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32225) Parquet footer information is read twice

2020-07-08 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated SPARK-32225:
-
Attachment: spark_parquet_footer_reads.png

> Parquet footer information is read twice
> 
>
> Key: SPARK-32225
> URL: https://issues.apache.org/jira/browse/SPARK-32225
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: spark_parquet_footer_reads.png
>
>
> When running queries, spark reads parquet footer information twice. In cloud 
> env, this would turn out to be expensive (depending on the jobs, # of 
> splits). It would be nice to reuse the footer information already read via 
> "ParquetInputFormat::buildReaderWithPartitionValues"
>  
> !image-2020-07-08-14-24-23-470.png|width=726,height=730!
> Lines of interest:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L271]
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L326]
>  
> [https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L105]
> [https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L111]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32225) Parquet footer information is read twice

2020-07-08 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated SPARK-32225:
-
Description: 
When running queries, spark reads parquet footer information twice. In cloud 
env, this would turn out to be expensive (depending on the jobs, # of splits). 
It would be nice to reuse the footer information already read via 
"ParquetInputFormat::buildReaderWithPartitionValues"

 

!spark_parquet_footer_reads.png|width=640,height=644!

Lines of interest:

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L271]

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L326]

 

[https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L105]

[https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L111]

 

  was:
When running queries, spark reads parquet footer information twice. In cloud 
env, this would turn out to be expensive (depending on the jobs, # of splits). 
It would be nice to reuse the footer information already read via 
"ParquetInputFormat::buildReaderWithPartitionValues"

 

!image-2020-07-08-14-24-23-470.png|width=726,height=730!

Lines of interest:

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L271]


[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L326]

 

[https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L105]


[https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L111]

 


> Parquet footer information is read twice
> 
>
> Key: SPARK-32225
> URL: https://issues.apache.org/jira/browse/SPARK-32225
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: spark_parquet_footer_reads.png
>
>
> When running queries, spark reads parquet footer information twice. In cloud 
> env, this would turn out to be expensive (depending on the jobs, # of 
> splits). It would be nice to reuse the footer information already read via 
> "ParquetInputFormat::buildReaderWithPartitionValues"
>  
> !spark_parquet_footer_reads.png|width=640,height=644!
> Lines of interest:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L271]
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L326]
>  
> [https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L105]
> [https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L111]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32214) The type conversion function generated in makeFromJava for "other" type uses a wrong variable.

2020-07-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32214.
--
Fix Version/s: 3.1.0
   3.0.1
   2.4.7
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/29029

> The type conversion function generated in makeFromJava for "other"  type uses 
> a wrong variable.
> ---
>
> Key: SPARK-32214
> URL: https://issues.apache.org/jira/browse/SPARK-32214
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> `makeFromJava` in `EvaluatePython` create a type conversion function for some 
> Java/Scala types.
> For `other` type, the parameter of the type conversion function is named 
> `obj` but `other` is mistakenly used rather than `obj` in the function body.
> {code:java}
> case other => (obj: Any) => nullSafeConvert(other)(PartialFunction.empty) 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32224) Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153357#comment-17153357
 ] 

Apache Spark commented on SPARK-32224:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29037

> Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes
> 
>
> Key: SPARK-32224
> URL: https://issues.apache.org/jira/browse/SPARK-32224
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> SPARK-29603 support configure 'spark.yarn.priority' on Yarn deploy modes, and 
> should we support pass this configuration through the SparkSubmit command line



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32224) Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32224:


Assignee: Apache Spark

> Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes
> 
>
> Key: SPARK-32224
> URL: https://issues.apache.org/jira/browse/SPARK-32224
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-29603 support configure 'spark.yarn.priority' on Yarn deploy modes, and 
> should we support pass this configuration through the SparkSubmit command line



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32224) Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32224:


Assignee: (was: Apache Spark)

> Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes
> 
>
> Key: SPARK-32224
> URL: https://issues.apache.org/jira/browse/SPARK-32224
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> SPARK-29603 support configure 'spark.yarn.priority' on Yarn deploy modes, and 
> should we support pass this configuration through the SparkSubmit command line



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32224) Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes

2020-07-08 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-32224:
-
Description: SPARK-29603 support configure 'spark.yarn.priority' on Yarn 
deploy modes, and should we support pass this configuration through the 
SparkSubmit command line  (was: SPARK-29603 support configure 
'spark.yarn.priority' on Yarn deploy modes, and should we support pass this 
configuration through the SparkSubmit commad line)

> Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes
> 
>
> Key: SPARK-32224
> URL: https://issues.apache.org/jira/browse/SPARK-32224
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> SPARK-29603 support configure 'spark.yarn.priority' on Yarn deploy modes, and 
> should we support pass this configuration through the SparkSubmit command line



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32224) Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes

2020-07-08 Thread Yang Jie (Jira)
Yang Jie created SPARK-32224:


 Summary: Support pass 'spark.yarn.priority' through the 
SparkSubmit on Yarn modes
 Key: SPARK-32224
 URL: https://issues.apache.org/jira/browse/SPARK-32224
 Project: Spark
  Issue Type: Improvement
  Components: Spark Submit
Affects Versions: 3.1.0
Reporter: Yang Jie


SPARK-29603 support configure 'spark.yarn.priority' on Yarn deploy modes, and 
should we support pass this configuration through the SparkSubmit commad line



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31760) Simplification Based on Containment

2020-07-08 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153324#comment-17153324
 ] 

Yuming Wang commented on SPARK-31760:
-

OK. Thank you [~glashenko].

> Simplification Based on Containment
> ---
>
> Key: SPARK-31760
> URL: https://issues.apache.org/jira/browse/SPARK-31760
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: starter
>
> https://docs.teradata.com/reader/Ws7YT1jvRK2vEr1LpVURug/V~FCwD9BL7gY4ac3WwHInw



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32223) Support adding a user provided config map.

2020-07-08 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32223:
---

 Summary: Support adding a user provided config map.
 Key: SPARK-32223
 URL: https://issues.apache.org/jira/browse/SPARK-32223
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


The semantics of this will be discussed and added soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32222) Add integration tests

2020-07-08 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-3:
---

 Summary: Add integration tests
 Key: SPARK-3
 URL: https://issues.apache.org/jira/browse/SPARK-3
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


An integration test by placing a configuration file in SPARK_CONF_DIR, and 
verifying it is loaded on the executors in both client and cluster deploy mode. 
For this, a log4j.properties file is a good candidate for testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32213) saveAsTable deletes all files in path

2020-07-08 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153312#comment-17153312
 ] 

angerszhu commented on SPARK-32213:
---

[~yuvalr]

You can review my pr and give some suggestions.

 

> saveAsTable deletes all files in path
> -
>
> Key: SPARK-32213
> URL: https://issues.apache.org/jira/browse/SPARK-32213
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Yuval Rochman
>Priority: Major
>
> The problem is presented in the following link:
> [https://stackoverflow.com/questions/62782637/saveastable-can-delete-all-my-files-in-desktop?noredirect=1#comment111026138_62782637]
> Apparently, without no warning, all files is desktop where deleted after 
> writing a file.
> There is no warning in Pyspark that the "Path"  parameter makes that problem. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2020-07-08 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32221:
---

 Summary: Avoid possible errors due to incorrect file size or type 
supplied in spark conf.
 Key: SPARK-32221
 URL: https://issues.apache.org/jira/browse/SPARK-32221
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB. Once etcd is upgraded in all the popular k8s clusters, then we can hope 
to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1MiB limit and WARNING the 
user about the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32213) saveAsTable deletes all files in path

2020-07-08 Thread Yuval Rochman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153310#comment-17153310
 ] 

Yuval Rochman commented on SPARK-32213:
---

Agreed.

> saveAsTable deletes all files in path
> -
>
> Key: SPARK-32213
> URL: https://issues.apache.org/jira/browse/SPARK-32213
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Yuval Rochman
>Priority: Major
>
> The problem is presented in the following link:
> [https://stackoverflow.com/questions/62782637/saveastable-can-delete-all-my-files-in-desktop?noredirect=1#comment111026138_62782637]
> Apparently, without no warning, all files is desktop where deleted after 
> writing a file.
> There is no warning in Pyspark that the "Path"  parameter makes that problem. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32220:


Assignee: (was: Apache Spark)

> Cartesian Product Hint cause data error
> ---
>
> Key: SPARK-32220
> URL: https://issues.apache.org/jira/browse/SPARK-32220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> spark-sql> select * from test4 order by a asc;
> 1 2
> Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
> SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
> spark-sql>select * from test5 order by a asc
> 1 2
> 2 2
> Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
> SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
> spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 
> where test4.a = test5.a order by test4.a asc ;
> 1 2 1 2
> 1 2 2 2
> Time taken: 0.351 seconds, Fetched 2 row(s)
> 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 
> 2 row(s){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153303#comment-17153303
 ] 

Apache Spark commented on SPARK-32220:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/29035

> Cartesian Product Hint cause data error
> ---
>
> Key: SPARK-32220
> URL: https://issues.apache.org/jira/browse/SPARK-32220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> spark-sql> select * from test4 order by a asc;
> 1 2
> Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
> SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
> spark-sql>select * from test5 order by a asc
> 1 2
> 2 2
> Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
> SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
> spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 
> where test4.a = test5.a order by test4.a asc ;
> 1 2 1 2
> 1 2 2 2
> Time taken: 0.351 seconds, Fetched 2 row(s)
> 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 
> 2 row(s){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32220:


Assignee: Apache Spark

> Cartesian Product Hint cause data error
> ---
>
> Key: SPARK-32220
> URL: https://issues.apache.org/jira/browse/SPARK-32220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> spark-sql> select * from test4 order by a asc;
> 1 2
> Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
> SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
> spark-sql>select * from test5 order by a asc
> 1 2
> 2 2
> Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
> SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
> spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 
> where test4.a = test5.a order by test4.a asc ;
> 1 2 1 2
> 1 2 2 2
> Time taken: 0.351 seconds, Fetched 2 row(s)
> 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 
> 2 row(s){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32219) Add SHOW CACHED TABLES Command

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153267#comment-17153267
 ] 

Apache Spark commented on SPARK-32219:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/29034

> Add SHOW CACHED TABLES Command
> --
>
> Key: SPARK-32219
> URL: https://issues.apache.org/jira/browse/SPARK-32219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32219) Add SHOW CACHED TABLES Command

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32219:


Assignee: Apache Spark

> Add SHOW CACHED TABLES Command
> --
>
> Key: SPARK-32219
> URL: https://issues.apache.org/jira/browse/SPARK-32219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32219) Add SHOW CACHED TABLES Command

2020-07-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153266#comment-17153266
 ] 

Apache Spark commented on SPARK-32219:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/29034

> Add SHOW CACHED TABLES Command
> --
>
> Key: SPARK-32219
> URL: https://issues.apache.org/jira/browse/SPARK-32219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32219) Add SHOW CACHED TABLES Command

2020-07-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32219:


Assignee: (was: Apache Spark)

> Add SHOW CACHED TABLES Command
> --
>
> Key: SPARK-32219
> URL: https://issues.apache.org/jira/browse/SPARK-32219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153260#comment-17153260
 ] 

angerszhu commented on SPARK-32220:
---

raise a pr soon

> Cartesian Product Hint cause data error
> ---
>
> Key: SPARK-32220
> URL: https://issues.apache.org/jira/browse/SPARK-32220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> spark-sql> select * from test4 order by a asc;
> 1 2
> Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
> SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
> spark-sql>select * from test5 order by a asc
> 1 2
> 2 2
> Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
> SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
> spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 
> where test4.a = test5.a order by test4.a asc ;
> 1 2 1 2
> 1 2 2 2
> Time taken: 0.351 seconds, Fetched 2 row(s)
> 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 
> 2 row(s){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-32220:
--
Description: 
{code:java}
spark-sql> select * from test4 order by a asc;
1 2
Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
spark-sql>select * from test5 order by a asc
1 2
2 2
Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 
where test4.a = test5.a order by test4.a asc ;
1 2 1 2
1 2 2 2
Time taken: 0.351 seconds, Fetched 2 row(s)
20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 2 
row(s){code}

  was:
{code:java}
spark-sql> select * from test4 order by a asc;
1 2
Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
spark-sql>select * from test5 order by a asc
1 2
2 2

Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
{code}


> Cartesian Product Hint cause data error
> ---
>
> Key: SPARK-32220
> URL: https://issues.apache.org/jira/browse/SPARK-32220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> spark-sql> select * from test4 order by a asc;
> 1 2
> Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
> SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
> spark-sql>select * from test5 order by a asc
> 1 2
> 2 2
> Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
> SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
> spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 
> where test4.a = test5.a order by test4.a asc ;
> 1 2 1 2
> 1 2 2 2
> Time taken: 0.351 seconds, Fetched 2 row(s)
> 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 
> 2 row(s){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-32220:
--
Description: 
{code:java}
spark-sql> select * from test4 order by a asc;
1 2
Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
spark-sql>select * from test5 order by a asc
1 2
2 2

Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
{code}

  was:
{code:java}
spark-sql> select * from test1 order by a asc;
1 2
1 4
2 3
3 4
Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
spark-sql>select * from test2 order by a asc
1 2
1 3
1 3
2 4
2 5
3 1
4 1
4 5
5 1
5 3
5 4
6 10
6 3
6 8
7 1
7 10
7 11
7 12
7 13
7 15
7 20
7 5
7 8
8 2
Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
{code}


> Cartesian Product Hint cause data error
> ---
>
> Key: SPARK-32220
> URL: https://issues.apache.org/jira/browse/SPARK-32220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> spark-sql> select * from test4 order by a asc;
> 1 2
> Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
> SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
> spark-sql>select * from test5 order by a asc
> 1 2
> 2 2
> Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
> SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-32220:
--
Description: 
{code:java}
spark-sql> select * from test1 order by a asc;
1 2
1 4
2 3
3 4
Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
spark-sql>select * from test2 order by a asc
1 2
1 3
1 3
2 4
2 5
3 1
4 1
4 5
5 1
5 3
5 4
6 10
6 3
6 8
7 1
7 10
7 11
7 12
7 13
7 15
7 20
7 5
7 8
8 2
Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
{code}

> Cartesian Product Hint cause data error
> ---
>
> Key: SPARK-32220
> URL: https://issues.apache.org/jira/browse/SPARK-32220
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> spark-sql> select * from test1 order by a asc;
> 1 2
> 1 4
> 2 3
> 3 4
> Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO 
> SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
> spark-sql>select * from test2 order by a asc
> 1 2
> 1 3
> 1 3
> 2 4
> 2 5
> 3 1
> 4 1
> 4 5
> 5 1
> 5 3
> 5 4
> 6 10
> 6 3
> 6 8
> 7 1
> 7 10
> 7 11
> 7 12
> 7 13
> 7 15
> 7 20
> 7 5
> 7 8
> 8 2
> Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO 
> SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32220) Cartesian Product Hint cause data error

2020-07-08 Thread angerszhu (Jira)
angerszhu created SPARK-32220:
-

 Summary: Cartesian Product Hint cause data error
 Key: SPARK-32220
 URL: https://issues.apache.org/jira/browse/SPARK-32220
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32200) Redirect to the history page when accessed to /history without appliation id

2020-07-08 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-32200:
---
Affects Version/s: (was: 3.0.1)
   3.0.0

> Redirect to the history page when accessed to /history without appliation id
> 
>
> Key: SPARK-32200
> URL: https://issues.apache.org/jira/browse/SPARK-32200
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
>  
> In the current master, when we access to /history on the HistoryServer with 
> without application id, status code 400 will be returned.
> I wonder it's better to redirect to the history page instead for the better 
> UX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32200) Redirect to the history page when accessed to /history without appliation id

2020-07-08 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-32200:
---
Affects Version/s: (was: 3.0.0)

> Redirect to the history page when accessed to /history without appliation id
> 
>
> Key: SPARK-32200
> URL: https://issues.apache.org/jira/browse/SPARK-32200
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
>  
> In the current master, when we access to /history on the HistoryServer with 
> without application id, status code 400 will be returned.
> I wonder it's better to redirect to the history page instead for the better 
> UX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org