[jira] [Comment Edited] (SPARK-25829) Duplicated map keys are not handled consistently
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663118#comment-16663118 ] Wenchen Fan edited comment on SPARK-25829 at 11/15/18 7:32 AM: --- More investigation on "later entry wins". If we still allow duplicated keys in map physically, following functions need to be updated: Explode, PosExplode, GetMapValue, MapKeys, MapValues, MapEntries, TransformKeys, TransformValues, MapZipWith If we want to forbid duplicated keys in map, following functions need to be updated: CreateMap, MapFromArrays, MapFromEntries, MapFromString, MapConcat, TransformKeys, MapFilter, and also reading map from data sources. So "later entry wins" semantic is more ideal but needs more works. was (Author: cloud_fan): More investigation on "later entry wins". If we still allow duplicated keys in map physically, following functions need to be updated: Explode, PosExplode, GetMapValue, MapKeys, MapValues, MapEntries, TransformKeys, TransformValues, MapZipWith If we want to forbid duplicated keys in map, following functions need to be updated: CreateMap, MapFromArrays, MapFromEntries, MapFromString, MapConcat, MapFilter, and also reading map from data sources. So "later entry wins" semantic is more ideal but needs more works. > Duplicated map keys are not handled consistently > > > Key: SPARK-25829 > URL: https://issues.apache.org/jira/browse/SPARK-25829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Major > > In Spark SQL, we apply "earlier entry wins" semantic to duplicated map keys. > e.g. > {code} > scala> sql("SELECT map(1,2,1,3)[1]").show > +--+ > |map(1, 2, 1, 3)[1]| > +--+ > | 2| > +--+ > {code} > However, this handling is not applied consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25829) Duplicated map keys are not handled consistently
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663118#comment-16663118 ] Wenchen Fan edited comment on SPARK-25829 at 11/15/18 7:28 AM: --- More investigation on "later entry wins". If we still allow duplicated keys in map physically, following functions need to be updated: Explode, PosExplode, GetMapValue, MapKeys, MapValues, MapEntries, TransformKeys, TransformValues, MapZipWith If we want to forbid duplicated keys in map, following functions need to be updated: CreateMap, MapFromArrays, MapFromEntries, MapFromString, MapConcat, MapFilter, and also reading map from data sources. So "later entry wins" semantic is more ideal but needs more works. was (Author: cloud_fan): More investigation on "later entry wins". If we still allow duplicated keys in map physically, following functions need to be updated: Explode, PosExplode, GetMapValue, MapKeys, MapValues, MapEntries, TransformKeys, TransformValues, MapZipWith If we want to forbid duplicated keys in map, following functions need to be updated: CreateMap, MapFromArrays, MapFromEntries, MapConcat, MapFilter, and also reading map from data sources. So "later entry wins" semantic is more ideal but needs more works. > Duplicated map keys are not handled consistently > > > Key: SPARK-25829 > URL: https://issues.apache.org/jira/browse/SPARK-25829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Major > > In Spark SQL, we apply "earlier entry wins" semantic to duplicated map keys. > e.g. > {code} > scala> sql("SELECT map(1,2,1,3)[1]").show > +--+ > |map(1, 2, 1, 3)[1]| > +--+ > | 2| > +--+ > {code} > However, this handling is not applied consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26069) Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
[ https://issues.apache.org/jira/browse/SPARK-26069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26069: Assignee: Shixiong Zhu (was: Apache Spark) > Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures > - > > Key: SPARK-26069 > URL: https://issues.apache.org/jira/browse/SPARK-26069 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > {code} > sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.spark.network.RpcIntegrationSuite.assertErrorAndClosed(RpcIntegrationSuite.java:386) > at > org.apache.spark.network.RpcIntegrationSuite.sendRpcWithStreamFailures(RpcIntegrationSuite.java:347) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at com.novocode.junit.JUnitRunner$1.execute(JUnitRunner.java:132) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26069) Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
[ https://issues.apache.org/jira/browse/SPARK-26069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687589#comment-16687589 ] Apache Spark commented on SPARK-26069: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/23041 > Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures > - > > Key: SPARK-26069 > URL: https://issues.apache.org/jira/browse/SPARK-26069 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > {code} > sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.spark.network.RpcIntegrationSuite.assertErrorAndClosed(RpcIntegrationSuite.java:386) > at > org.apache.spark.network.RpcIntegrationSuite.sendRpcWithStreamFailures(RpcIntegrationSuite.java:347) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at com.novocode.junit.JUnitRunner$1.execute(JUnitRunner.java:132) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26069) Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
[ https://issues.apache.org/jira/browse/SPARK-26069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26069: Assignee: Apache Spark (was: Shixiong Zhu) > Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures > - > > Key: SPARK-26069 > URL: https://issues.apache.org/jira/browse/SPARK-26069 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Apache Spark >Priority: Major > > {code} > sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.spark.network.RpcIntegrationSuite.assertErrorAndClosed(RpcIntegrationSuite.java:386) > at > org.apache.spark.network.RpcIntegrationSuite.sendRpcWithStreamFailures(RpcIntegrationSuite.java:347) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at com.novocode.junit.JUnitRunner$1.execute(JUnitRunner.java:132) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26069) Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
Shixiong Zhu created SPARK-26069: Summary: Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures Key: SPARK-26069 URL: https://issues.apache.org/jira/browse/SPARK-26069 Project: Spark Issue Type: Test Components: Tests Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu {code} sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.spark.network.RpcIntegrationSuite.assertErrorAndClosed(RpcIntegrationSuite.java:386) at org.apache.spark.network.RpcIntegrationSuite.sendRpcWithStreamFailures(RpcIntegrationSuite.java:347) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at com.novocode.junit.JUnitRunner$1.execute(JUnitRunner.java:132) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26068) ChunkedByteBufferInputStream is truncated by empty chunk
[ https://issues.apache.org/jira/browse/SPARK-26068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687497#comment-16687497 ] Apache Spark commented on SPARK-26068: -- User 'linhong-intel' has created a pull request for this issue: https://github.com/apache/spark/pull/23040 > ChunkedByteBufferInputStream is truncated by empty chunk > > > Key: SPARK-26068 > URL: https://issues.apache.org/jira/browse/SPARK-26068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Liu, Linhong >Priority: Major > > If ChunkedByteBuffer contains empty chunk in the middle of it, then the > ChunkedByteBufferInputStream will be truncated. All data behind the empty > chunk will not be read. > The problematic code: > {code:java} > // ChunkedByteBuffer.scala > // Assume chunks.next returns an empty chunk, then we will reach > // else branch no matter chunks.hasNext = true or not. So some data is lost. > override def read(dest: Array[Byte], offset: Int, length: Int): Int = { > if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext) > { > currentChunk = chunks.next() > } > if (currentChunk != null && currentChunk.hasRemaining) { > val amountToGet = math.min(currentChunk.remaining(), length) > currentChunk.get(dest, offset, amountToGet) > amountToGet > } else { > close() > -1 > } > } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26068) ChunkedByteBufferInputStream is truncated by empty chunk
[ https://issues.apache.org/jira/browse/SPARK-26068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26068: Assignee: Apache Spark > ChunkedByteBufferInputStream is truncated by empty chunk > > > Key: SPARK-26068 > URL: https://issues.apache.org/jira/browse/SPARK-26068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Liu, Linhong >Assignee: Apache Spark >Priority: Major > > If ChunkedByteBuffer contains empty chunk in the middle of it, then the > ChunkedByteBufferInputStream will be truncated. All data behind the empty > chunk will not be read. > The problematic code: > {code:java} > // ChunkedByteBuffer.scala > // Assume chunks.next returns an empty chunk, then we will reach > // else branch no matter chunks.hasNext = true or not. So some data is lost. > override def read(dest: Array[Byte], offset: Int, length: Int): Int = { > if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext) > { > currentChunk = chunks.next() > } > if (currentChunk != null && currentChunk.hasRemaining) { > val amountToGet = math.min(currentChunk.remaining(), length) > currentChunk.get(dest, offset, amountToGet) > amountToGet > } else { > close() > -1 > } > } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26068) ChunkedByteBufferInputStream is truncated by empty chunk
[ https://issues.apache.org/jira/browse/SPARK-26068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687498#comment-16687498 ] Apache Spark commented on SPARK-26068: -- User 'linhong-intel' has created a pull request for this issue: https://github.com/apache/spark/pull/23040 > ChunkedByteBufferInputStream is truncated by empty chunk > > > Key: SPARK-26068 > URL: https://issues.apache.org/jira/browse/SPARK-26068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Liu, Linhong >Priority: Major > > If ChunkedByteBuffer contains empty chunk in the middle of it, then the > ChunkedByteBufferInputStream will be truncated. All data behind the empty > chunk will not be read. > The problematic code: > {code:java} > // ChunkedByteBuffer.scala > // Assume chunks.next returns an empty chunk, then we will reach > // else branch no matter chunks.hasNext = true or not. So some data is lost. > override def read(dest: Array[Byte], offset: Int, length: Int): Int = { > if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext) > { > currentChunk = chunks.next() > } > if (currentChunk != null && currentChunk.hasRemaining) { > val amountToGet = math.min(currentChunk.remaining(), length) > currentChunk.get(dest, offset, amountToGet) > amountToGet > } else { > close() > -1 > } > } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26068) ChunkedByteBufferInputStream is truncated by empty chunk
[ https://issues.apache.org/jira/browse/SPARK-26068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26068: Assignee: (was: Apache Spark) > ChunkedByteBufferInputStream is truncated by empty chunk > > > Key: SPARK-26068 > URL: https://issues.apache.org/jira/browse/SPARK-26068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Liu, Linhong >Priority: Major > > If ChunkedByteBuffer contains empty chunk in the middle of it, then the > ChunkedByteBufferInputStream will be truncated. All data behind the empty > chunk will not be read. > The problematic code: > {code:java} > // ChunkedByteBuffer.scala > // Assume chunks.next returns an empty chunk, then we will reach > // else branch no matter chunks.hasNext = true or not. So some data is lost. > override def read(dest: Array[Byte], offset: Int, length: Int): Int = { > if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext) > { > currentChunk = chunks.next() > } > if (currentChunk != null && currentChunk.hasRemaining) { > val amountToGet = math.min(currentChunk.remaining(), length) > currentChunk.get(dest, offset, amountToGet) > amountToGet > } else { > close() > -1 > } > } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26068) ChunkedByteBufferInputStream is truncated by empty chunk
[ https://issues.apache.org/jira/browse/SPARK-26068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu, Linhong updated SPARK-26068: - Description: If ChunkedByteBuffer contains empty chunk in the middle of it, then the ChunkedByteBufferInputStream will be truncated. All data behind the empty chunk will not be read. The problematic code: {code:java} // ChunkedByteBuffer.scala // Assume chunks.next returns an empty chunk, then we will reach // else branch no matter chunks.hasNext = true or not. So some data is lost. override def read(dest: Array[Byte], offset: Int, length: Int): Int = { if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext){ currentChunk = chunks.next() } if (currentChunk != null && currentChunk.hasRemaining) { val amountToGet = math.min(currentChunk.remaining(), length) currentChunk.get(dest, offset, amountToGet) amountToGet } else { close() -1 } } {code} was: If ChunkedByteBuffer contains empty chunk in the middle of it, then the ChunkedByteBufferInputStream will be truncated. All data behind the empty chunk will not be read. The problematic code: {code:java} // ChunkedByteBuffer.scala // Assume chunks.next returns an empty chunk, then we will reach // else branch no matter chunks.hasNext = true or not. So some data is lost. override def read(dest: Array[Byte], offset: Int, length: Int): Int = { if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext){ currentChunk = chunks.next() } if (currentChunk != null && currentChunk.hasRemaining) { val amountToGet = math.min(currentChunk.remaining(), length) currentChunk.get(dest, offset, amountToGet) amountToGet } else { close() -1 } } {code} > ChunkedByteBufferInputStream is truncated by empty chunk > > > Key: SPARK-26068 > URL: https://issues.apache.org/jira/browse/SPARK-26068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Liu, Linhong >Priority: Major > > If ChunkedByteBuffer contains empty chunk in the middle of it, then the > ChunkedByteBufferInputStream will be truncated. All data behind the empty > chunk will not be read. > The problematic code: > {code:java} > // ChunkedByteBuffer.scala > // Assume chunks.next returns an empty chunk, then we will reach > // else branch no matter chunks.hasNext = true or not. So some data is lost. > override def read(dest: Array[Byte], offset: Int, length: Int): Int = { > if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext) > { > currentChunk = chunks.next() > } > if (currentChunk != null && currentChunk.hasRemaining) { > val amountToGet = math.min(currentChunk.remaining(), length) > currentChunk.get(dest, offset, amountToGet) > amountToGet > } else { > close() > -1 > } > } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26068) ChunkedByteBufferInputStream is truncated by empty chunk
[ https://issues.apache.org/jira/browse/SPARK-26068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu, Linhong updated SPARK-26068: - Description: If ChunkedByteBuffer contains empty chunk in the middle of it, then the ChunkedByteBufferInputStream will be truncated. All data behind the empty chunk will not be read. The problematic code: {code:java} // ChunkedByteBuffer.scala // Assume chunks.next returns an empty chunk, then we will reach // else branch no matter chunks.hasNext = true or not. So some data is lost. override def read(dest: Array[Byte], offset: Int, length: Int): Int = { if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext){ currentChunk = chunks.next() } if (currentChunk != null && currentChunk.hasRemaining) { val amountToGet = math.min(currentChunk.remaining(), length) currentChunk.get(dest, offset, amountToGet) amountToGet } else { close() -1 } } {code} was: If ChunkedByteBuffer contains empty chunk in the middle of it, then the ChunkedByteBufferInputStream will be truncated. All data behind the empty chunk will not be read. The problematic code {code:java} // ChunkedByteBuffer.scala // Assume chunks.next returns an empty chunk, then we will reach // else branch no matter chunks.hasNext = true or not. So some data is lost. override def read(dest: Array[Byte], offset: Int, length: Int): Int = { if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext){ currentChunk = chunks.next() } if (currentChunk != null && currentChunk.hasRemaining) { val amountToGet = math.min(currentChunk.remaining(), length) currentChunk.get(dest, offset, amountToGet) amountToGet } else { close() -1 } } {code} > ChunkedByteBufferInputStream is truncated by empty chunk > > > Key: SPARK-26068 > URL: https://issues.apache.org/jira/browse/SPARK-26068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Liu, Linhong >Priority: Major > > If ChunkedByteBuffer contains empty chunk in the middle of it, then the > ChunkedByteBufferInputStream will be truncated. All data behind the empty > chunk will not be read. > The problematic code: > {code:java} > // ChunkedByteBuffer.scala > // Assume chunks.next returns an empty chunk, then we will reach > // else branch no matter chunks.hasNext = true or not. So some data is lost. > override def read(dest: Array[Byte], offset: Int, length: Int): Int = { > if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext) > { > currentChunk = chunks.next() > } > if (currentChunk != null && currentChunk.hasRemaining) { > val amountToGet = math.min(currentChunk.remaining(), length) > currentChunk.get(dest, offset, amountToGet) > amountToGet > } else { > close() > -1 > } > } > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26068) ChunkedByteBufferInputStream is truncated by empty chunk
Liu, Linhong created SPARK-26068: Summary: ChunkedByteBufferInputStream is truncated by empty chunk Key: SPARK-26068 URL: https://issues.apache.org/jira/browse/SPARK-26068 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: Liu, Linhong If ChunkedByteBuffer contains empty chunk in the middle of it, then the ChunkedByteBufferInputStream will be truncated. All data behind the empty chunk will not be read. The problematic code {code:java} // ChunkedByteBuffer.scala // Assume chunks.next returns an empty chunk, then we will reach // else branch no matter chunks.hasNext = true or not. So some data is lost. override def read(dest: Array[Byte], offset: Int, length: Int): Int = { if (currentChunk != null && !currentChunk.hasRemaining && chunks.hasNext){ currentChunk = chunks.next() } if (currentChunk != null && currentChunk.hasRemaining) { val amountToGet = math.min(currentChunk.remaining(), length) currentChunk.get(dest, offset, amountToGet) amountToGet } else { close() -1 } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26036) Break large tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26036. -- Resolution: Fixed Issue resolved by pull request 23033 [https://github.com/apache/spark/pull/23033] > Break large tests.py files into smaller files > - > > Key: SPARK-26036 > URL: https://issues.apache.org/jira/browse/SPARK-26036 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Spark Core >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26036) Break large tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-26036: Assignee: Hyukjin Kwon > Break large tests.py files into smaller files > - > > Key: SPARK-26036 > URL: https://issues.apache.org/jira/browse/SPARK-26036 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Spark Core >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26067) Pandas GROUPED_MAP udf breaks if DF has >255 columns
[ https://issues.apache.org/jira/browse/SPARK-26067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdeali Kothari updated SPARK-26067: Description: When I run spark's Pandas GROUPED_MAP udfs to apply a UDAF i wrote in pythohn/pandas on a grouped dataframe in spark - it fails if the number of columns is greater than 255 in Pytohn 3.6 and lower. {code:java} import pyspark from pyspark.sql import types as T, functions as F spark = pyspark.sql.SparkSession.builder.getOrCreate() df = spark.createDataFrame( [[i for i in range(256)], [i+1 for i in range(256)]], schema=["a" + str(i) for i in range(256)]) new_schema = T.StructType([ field for field in df.schema] + [T.StructField("new_row", T.DoubleType())]) def myfunc(df): df['new_row'] = 1 return df myfunc_udf = F.pandas_udf(new_schema, F.PandasUDFType.GROUPED_MAP)(myfunc) df2 = df.groupBy(["a1"]).apply(myfunc_udf) print(df2.count()) # This FAILS # ERROR: # Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 219, in main # func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type) # File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 148, in read_udfs # mapper = eval(mapper_str, udfs) # File "", line 1 # SyntaxError: more than 255 arguments {code} Note: In Python 3.7 the 255 limit was raised, but I have not tried with Pytohn 3.7 ...https://docs.python.org/3.7/whatsnew/3.7.html#other-language-changes I was using Python 3.5 (from anaconda), Spark 2.3.1 to reproduce thihs on my Hadoop Linux cluster and also on my Mac standalone spark installation. was: When I run spark's Pandas GROUPED_MAP udfs to apply a UDAF i wrote in pythohn/pandas on a grouped dataframe in spark - it fails if the number of columns is greater than 255 in Pytohn 3.6 and lower. {code:java} import pyspark from pyspark.sql import types as T, functions as F spark = pyspark.sql.SparkSession.builder.getOrCreate() df = spark.createDataFrame( [[i for i in range(256)], [i+1 for i in range(256)]], schema=["a" + str(i) for i in range(256)]) new_schema = T.StructType([ field for field in df.schema] + [T.StructField("new_row", T.DoubleType())]) def myfunc(df): df['new_row'] = 1 return df myfunc_udf = F.pandas_udf(new_schema, F.PandasUDFType.GROUPED_MAP)(myfunc) df2 = df.groupBy(["a1"]).apply(myfunc_udf) print(df2.count()) # This FAILS # ERROR: # Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 219, in main # func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type) # File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 148, in read_udfs # mapper = eval(mapper_str, udfs) # File "", line 1 # SyntaxError: more than 255 arguments {code} I believe thhis is happening because internally this creates a UDF with inputs as every column in the DF. https://github.com/apache/spark/blob/41c2227a2318029709553a588e44dee28f106350/python/pyspark/sql/group.py#L274 Note: In Python 3.7 the 255 limit was raised, but I have not tried with Pytohn 3.7 ...https://docs.python.org/3.7/whatsnew/3.7.html#other-language-changes I was using Python 3.5 (from anaconda), Spark 2.3.1 to reproduce thihs on my Hadoop Linux cluster and also on my Mac standalone spark installation. > Pandas GROUPED_MAP udf breaks if DF has >255 columns > > > Key: SPARK-26067 > URL: https://issues.apache.org/jira/browse/SPARK-26067 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Abdeali Kothari >Priority: Major > > When I run spark's Pandas GROUPED_MAP udfs to apply a UDAF i wrote in > pythohn/pandas on a grouped dataframe in spark - it fails if the number of > columns is greater than 255 in Pytohn 3.6 and lower. > {code:java} > import pyspark > from pyspark.sql import types as T, functions as F > spark = pyspark.sql.SparkSession.builder.getOrCreate() > df = spark.createDataFrame( > [[i for i in range(256)], [i+1 for i in range(256)]], schema=["a" + > str(i) for i in range(256)]) > new_schema = T.StructType([ > field for field in df.schema] + [T.StructField("new_row", > T.DoubleType())]) > def myfunc(df): > df['new_row'] = 1 > return df > myfunc_udf = F.pandas_udf(new_schema, F.PandasUDFType.GROUPED_MAP)(myfunc) > df2 = df.groupBy(["a1"]).apply(myfunc_udf) > print(df2.count()) # This FAILS > # ERROR: > # Caused by: org.apache.spark.api.python.PythonException: Traceback (most > recent call last): > # File >
[jira] [Created] (SPARK-26067) Pandas GROUPED_MAP udf breaks if DF has >255 columns
Abdeali Kothari created SPARK-26067: --- Summary: Pandas GROUPED_MAP udf breaks if DF has >255 columns Key: SPARK-26067 URL: https://issues.apache.org/jira/browse/SPARK-26067 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.0, 2.3.2 Reporter: Abdeali Kothari When I run spark's Pandas GROUPED_MAP udfs to apply a UDAF i wrote in pythohn/pandas on a grouped dataframe in spark - it fails if the number of columns is greater than 255 in Pytohn 3.6 and lower. {code:java} import pyspark from pyspark.sql import types as T, functions as F spark = pyspark.sql.SparkSession.builder.getOrCreate() df = spark.createDataFrame( [[i for i in range(256)], [i+1 for i in range(256)]], schema=["a" + str(i) for i in range(256)]) new_schema = T.StructType([ field for field in df.schema] + [T.StructField("new_row", T.DoubleType())]) def myfunc(df): df['new_row'] = 1 return df myfunc_udf = F.pandas_udf(new_schema, F.PandasUDFType.GROUPED_MAP)(myfunc) df2 = df.groupBy(["a1"]).apply(myfunc_udf) print(df2.count()) # This FAILS # ERROR: # Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 219, in main # func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type) # File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 148, in read_udfs # mapper = eval(mapper_str, udfs) # File "", line 1 # SyntaxError: more than 255 arguments {code} I believe thhis is happening because internally this creates a UDF with inputs as every column in the DF. https://github.com/apache/spark/blob/41c2227a2318029709553a588e44dee28f106350/python/pyspark/sql/group.py#L274 Note: In Python 3.7 the 255 limit was raised, but I have not tried with Pytohn 3.7 ...https://docs.python.org/3.7/whatsnew/3.7.html#other-language-changes I was using Python 3.5 (from anaconda), Spark 2.3.1 to reproduce thihs on my Hadoop Linux cluster and also on my Mac standalone spark installation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26017) SVD++ error rate is high in the test suite.
[ https://issues.apache.org/jira/browse/SPARK-26017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687361#comment-16687361 ] shahid commented on SPARK-26017: I am analyzing it > SVD++ error rate is high in the test suite. > --- > > Key: SPARK-26017 > URL: https://issues.apache.org/jira/browse/SPARK-26017 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.3.2 >Reporter: shahid >Priority: Major > Attachments: image-2018-11-12-20-41-49-370.png > > > In the test suite, "{color:#008000}Test SVD++ with mean square error on > training set", {color}error rate is quite high, even for large number of > iterations. > > !image-2018-11-12-20-41-49-370.png! > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25957) Skip building spark-r docker image if spark distribution does not have R support
[ https://issues.apache.org/jira/browse/SPARK-25957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687337#comment-16687337 ] Nagaram Prasad Addepally commented on SPARK-25957: -- Thanks [~vanzin]... we can do skip flags instead. I think we can auto detect R installation by checking for presence of "$SPARK_HOME/R/lib" folder. Correct me if I am wrong. I will work on this change and post a PR. Can you assign this Jira to me. I do not seem to have permission to assign this Jira to myself. > Skip building spark-r docker image if spark distribution does not have R > support > > > Key: SPARK-25957 > URL: https://issues.apache.org/jira/browse/SPARK-25957 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Nagaram Prasad Addepally >Priority: Major > > [docker-image-tool.sh|https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh] > script by default tries to build spark-r image. We may not always build > spark distribution with R support. It would be good to skip building and > publishing spark-r images if R support is not available in the spark > distribution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25956) Make Scala 2.12 as default Scala version in Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-25956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-25956. --- Resolution: Fixed Assignee: DB Tsai Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/22967 > Make Scala 2.12 as default Scala version in Spark 3.0 > - > > Key: SPARK-25956 > URL: https://issues.apache.org/jira/browse/SPARK-25956 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.4.0 >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > Fix For: 3.0.0 > > > Scala 2.11 will unlikely support Java 11 > https://github.com/scala/scala-dev/issues/559#issuecomment-436160166; hence, > we will make Scala 2.12 as default Scala version in Spark 3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25957) Skip building spark-r docker image if spark distribution does not have R support
[ https://issues.apache.org/jira/browse/SPARK-25957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687306#comment-16687306 ] Marcelo Vanzin edited comment on SPARK-25957 at 11/15/18 12:16 AM: --- I prefer to keep the current behavior and add options to disable specific images (e.g. "\-\-skip-r", "\-\-skip-pyspark"). If "\-\-skip-r" could be auto-detected, even better. was (Author: vanzin): I prefer to keep the current behavior and add options to disable specific images (e.g. "--skip-r", "--skip-pyspark"). If "--skip-r" could be auto-detected, even better. > Skip building spark-r docker image if spark distribution does not have R > support > > > Key: SPARK-25957 > URL: https://issues.apache.org/jira/browse/SPARK-25957 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Nagaram Prasad Addepally >Priority: Major > > [docker-image-tool.sh|https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh] > script by default tries to build spark-r image. We may not always build > spark distribution with R support. It would be good to skip building and > publishing spark-r images if R support is not available in the spark > distribution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25957) Skip building spark-r docker image if spark distribution does not have R support
[ https://issues.apache.org/jira/browse/SPARK-25957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687306#comment-16687306 ] Marcelo Vanzin commented on SPARK-25957: I prefer to keep the current behavior and add options to disable specific images (e.g. "--skip-r", "--skip-pyspark"). If "--skip-r" could be auto-detected, even better. > Skip building spark-r docker image if spark distribution does not have R > support > > > Key: SPARK-25957 > URL: https://issues.apache.org/jira/browse/SPARK-25957 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Nagaram Prasad Addepally >Priority: Major > > [docker-image-tool.sh|https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh] > script by default tries to build spark-r image. We may not always build > spark distribution with R support. It would be good to skip building and > publishing spark-r images if R support is not available in the spark > distribution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25957) Skip building spark-r docker image if spark distribution does not have R support
[ https://issues.apache.org/jira/browse/SPARK-25957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687276#comment-16687276 ] Nagaram Prasad Addepally commented on SPARK-25957: -- I think we can parameterize what images we want to build and publish using [docker-image-tool.sh|https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh]. By default, we can build and publish all images (to keep existing behavior intact) and provide an override option to specify which images we want to build explicitly. Note that we will always build base spark (JVM) docker image. For example, {noformat} ./docker-image-tool.sh -r -t build|publish # Builds/publishes all docker images ./docker-image-tool.sh -r -t --select [p,R] build|publish # Builds/publishes docker images specified in select param. We will always build spark base (JVM) docker image.{noformat} Does this approach sound reasonable? Or anyone has a better suggestion? > Skip building spark-r docker image if spark distribution does not have R > support > > > Key: SPARK-25957 > URL: https://issues.apache.org/jira/browse/SPARK-25957 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Nagaram Prasad Addepally >Priority: Major > > [docker-image-tool.sh|https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh] > script by default tries to build spark-r image. We may not always build > spark distribution with R support. It would be good to skip building and > publishing spark-r images if R support is not available in the spark > distribution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26066) Moving truncatedString to sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-26066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687169#comment-16687169 ] Apache Spark commented on SPARK-26066: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/23039 > Moving truncatedString to sql/catalyst > -- > > Key: SPARK-26066 > URL: https://issues.apache.org/jira/browse/SPARK-26066 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > The truncatedString method is used to convert elements of TreeNodes and > expressions to strings, and called only from sql.* packages. The ticket aims > to move the method out from core. Also need to introduce SQL config to > control maximum number of fields by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26066) Moving truncatedString to sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-26066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26066: Assignee: (was: Apache Spark) > Moving truncatedString to sql/catalyst > -- > > Key: SPARK-26066 > URL: https://issues.apache.org/jira/browse/SPARK-26066 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > The truncatedString method is used to convert elements of TreeNodes and > expressions to strings, and called only from sql.* packages. The ticket aims > to move the method out from core. Also need to introduce SQL config to > control maximum number of fields by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26066) Moving truncatedString to sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-26066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26066: Assignee: Apache Spark > Moving truncatedString to sql/catalyst > -- > > Key: SPARK-26066 > URL: https://issues.apache.org/jira/browse/SPARK-26066 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Minor > > The truncatedString method is used to convert elements of TreeNodes and > expressions to strings, and called only from sql.* packages. The ticket aims > to move the method out from core. Also need to introduce SQL config to > control maximum number of fields by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25451: Assignee: Apache Spark > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Assignee: Apache Spark >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687130#comment-16687130 ] Apache Spark commented on SPARK-25451: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23038 > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25451: Assignee: (was: Apache Spark) > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687128#comment-16687128 ] Apache Spark commented on SPARK-25451: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/23038 > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25986) Banning throw new Errors
[ https://issues.apache.org/jira/browse/SPARK-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25986. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22989 [https://github.com/apache/spark/pull/22989] > Banning throw new Errors > > > Key: SPARK-25986 > URL: https://issues.apache.org/jira/browse/SPARK-25986 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Adding a linter rule to ban the construction of new Errors and then make sure > that we throw the correct exceptions. See the PR > https://github.com/apache/spark/pull/22969 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25986) Banning throw new Errors
[ https://issues.apache.org/jira/browse/SPARK-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-25986: -- Docs Text: Release notes text: Certain methods in Spark MLlib would throw NotImplementedError or UnknownError on illegal input. These have been changed to more standard UnsupportedOperationException and IllegalArgumentException. > Banning throw new Errors > > > Key: SPARK-25986 > URL: https://issues.apache.org/jira/browse/SPARK-25986 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Yuanjian Li >Priority: Major > Labels: release-notes > > Adding a linter rule to ban the construction of new Errors and then make sure > that we throw the correct exceptions. See the PR > https://github.com/apache/spark/pull/22969 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-25778: -- Assignee: Greg Senia > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Assignee: Greg Senia >Priority: Major > Fix For: 2.4.1, 3.0.0 > > Attachments: SPARK-25778.2.patch, SPARK-25778.4.patch, > SPARK-25778.patch > > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) > at >
[jira] [Resolved] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-25778. Resolution: Fixed Fix Version/s: 2.4.1 3.0.0 Issue resolved by pull request 22867 [https://github.com/apache/spark/pull/22867] > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Priority: Major > Fix For: 3.0.0, 2.4.1 > > Attachments: SPARK-25778.2.patch, SPARK-25778.4.patch, > SPARK-25778.patch > > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at >
[jira] [Resolved] (SPARK-24421) Accessing sun.misc.Cleaner in JDK11
[ https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-24421. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22993 [https://github.com/apache/spark/pull/22993] > Accessing sun.misc.Cleaner in JDK11 > --- > > Key: SPARK-24421 > URL: https://issues.apache.org/jira/browse/SPARK-24421 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: DB Tsai >Assignee: Sean Owen >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Many internal APIs such as unsafe are encapsulated in JDK9+, see > http://openjdk.java.net/jeps/260 for detail. > To use Unsafe, we need to add *jdk.unsupported* to our code’s module > declaration: > {code:java} > module java9unsafe { > requires jdk.unsupported; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24421) Accessing sun.misc.Cleaner in JDK11
[ https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-24421: - Assignee: Sean Owen > Accessing sun.misc.Cleaner in JDK11 > --- > > Key: SPARK-24421 > URL: https://issues.apache.org/jira/browse/SPARK-24421 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: DB Tsai >Assignee: Sean Owen >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Many internal APIs such as unsafe are encapsulated in JDK9+, see > http://openjdk.java.net/jeps/260 for detail. > To use Unsafe, we need to add *jdk.unsupported* to our code’s module > declaration: > {code:java} > module java9unsafe { > requires jdk.unsupported; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26066) Moving truncatedString to sql/catalyst
Maxim Gekk created SPARK-26066: -- Summary: Moving truncatedString to sql/catalyst Key: SPARK-26066 URL: https://issues.apache.org/jira/browse/SPARK-26066 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk The truncatedString method is used to convert elements of TreeNodes and expressions to strings, and called only from sql.* packages. The ticket aims to move the method out from core. Also need to introduce SQL config to control maximum number of fields by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated SPARK-26041: -- Environment: Spark 2.3.2 Hadoop 2.6 When we materialize one of intermediate dataframes as a parquet table, and read it back in, this error doesn't happen (exact same downflow queries ). was: Spark 2.3.2 PySpark 2.7.15 + Hadoop 2.6 When we materialize one of intermediate dataframes as a parquet table, and read it back in, this error doesn't happen (exact same downflow queries ). > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > Attachments: SPARK-26041.txt > > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at >
[jira] [Commented] (SPARK-25982) Dataframe write is non blocking in fair scheduling mode
[ https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687034#comment-16687034 ] Ramandeep Singh commented on SPARK-25982: - Sure, a) The setting for scheduler is fair scheduler --conf 'spark.scheduler.mode'='FAIR' b) There are independent jobs at one stage that are scheduled. This is okay, all of them block on dataframe write to complete. ``` val futures = steps.par.map(stepId => Future { processWrite(stepsMap(stepId)) }).par futures.foreach(Await.result(_, Duration.create(timeout, TimeUnit.MINUTES))) ``` Here, the processWrite processes write operations in parallel and awaits on each of them to complete, but the persist or write operation returns before it has written all the partitions of the dataframes, so other jobs at a later stage end up being run. > Dataframe write is non blocking in fair scheduling mode > --- > > Key: SPARK-25982 > URL: https://issues.apache.org/jira/browse/SPARK-25982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Ramandeep Singh >Priority: Major > > Hi, > I have noticed that expected behavior of dataframe write operation to block > is not working in fair scheduling mode. > Ideally when a dataframe write is occurring and a future is blocking on > AwaitResult, no other job should be started, but this is not the case. I have > noticed that other jobs are started when the partitions are being written. > > Regards, > Ramandeep Singh > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26051) Can't create table with column name '22222d'
[ https://issues.apache.org/jira/browse/SPARK-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687025#comment-16687025 ] Dilip Biswal commented on SPARK-26051: -- I would like to take a look at this one. > Can't create table with column name '2d' > > > Key: SPARK-26051 > URL: https://issues.apache.org/jira/browse/SPARK-26051 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Xie Juntao >Priority: Minor > > I can't create table in which the column name is '2d' when I use > spark-sql. It seems a SQL parser bug because it's ok for creating table with > the column name ''2m". > {code:java} > spark-sql> create table t1(2d int); > Error in query: > no viable alternative at input 'create table t1(2d'(line 1, pos 16) > == SQL == > create table t1(2d int) > ^^^ > spark-sql> create table t1(2m int); > 18/11/14 09:13:53 INFO HiveMetaStore: 0: get_database: global_temp > 18/11/14 09:13:53 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: > global_temp > 18/11/14 09:13:53 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default > 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: > default > 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default > 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: > default > 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_table : db=default tbl=t1 > 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : > db=default tbl=t1 > 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default > 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: > default > 18/11/14 09:13:55 INFO HiveMetaStore: 0: create_table: Table(tableName:t1, > dbName:default, owner:root, createTime:1542158033, lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:2m, type:int, > comment:null)], > location:file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=1}), bucketCols:[], sortCols:[], > parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], > skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"2m","type":"integer","nullable":true,"metadata":{}}]}, > spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.3.1}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null)) > 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=create_table: > Table(tableName:t1, dbName:default, owner:root, createTime:1542158033, > lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:2m, type:int, comment:null)], > location:file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1, > inputFormat:org.apache.hadoop.mapred.TextInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > parameters:{serialization.format=1}), bucketCols:[], sortCols:[], > parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], > skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"2m","type":"integer","nullable":true,"metadata":{}}]}, > spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.3.1}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null)) > 18/11/14 09:13:55 WARN HiveMetaStore: Location: > file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1 > specified for non-external table:t1 > 18/11/14 09:13:55 INFO FileUtils: Creating directory if it doesn't exist: > file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1 > Time taken: 2.15 seconds > 18/11/14 09:13:56 INFO SparkSQLCLIDriver: Time taken: 2.15 seconds{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Assigned] (SPARK-25965) Add read benchmark for Avro
[ https://issues.apache.org/jira/browse/SPARK-25965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-25965: - Assignee: Gengliang Wang > Add read benchmark for Avro > --- > > Key: SPARK-25965 > URL: https://issues.apache.org/jira/browse/SPARK-25965 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.0.0 > > > Add read benchmark for Avro, which is missing for a period. > The benchmark is similar to DataSourceReadBenchmark and OrcReadBenchmark -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25965) Add read benchmark for Avro
[ https://issues.apache.org/jira/browse/SPARK-25965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-25965. --- Resolution: Fixed Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/22966 > Add read benchmark for Avro > --- > > Key: SPARK-25965 > URL: https://issues.apache.org/jira/browse/SPARK-25965 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.0.0 > > > Add read benchmark for Avro, which is missing for a period. > The benchmark is similar to DataSourceReadBenchmark and OrcReadBenchmark -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26065) Change query hint from a `LogicalPlan` to a field
[ https://issues.apache.org/jira/browse/SPARK-26065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26065: Assignee: Apache Spark > Change query hint from a `LogicalPlan` to a field > - > > Key: SPARK-26065 > URL: https://issues.apache.org/jira/browse/SPARK-26065 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maryann Xue >Assignee: Apache Spark >Priority: Major > > The existing query hint implementation relies on a logical plan node > {{ResolvedHint}} to store query hints in logical plans, and on {{Statistics}} > in physical plans. Since {{ResolvedHint}} is not really a logical operator > and can break the pattern matching for existing and future optimization > rules, it is a issue to the Optimizer as the old {{AnalysisBarrier}} to the > Analyzer. > Given the fact that all our query hints are either 1) a join hint, i.e., > broadcast hint; or 2) a re-partition hint, which is indeed an operator, we > only need to add a hint field on the {{Join}} plan and that will be a good > enough solution for current hint usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26065) Change query hint from a `LogicalPlan` to a field
[ https://issues.apache.org/jira/browse/SPARK-26065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26065: Assignee: (was: Apache Spark) > Change query hint from a `LogicalPlan` to a field > - > > Key: SPARK-26065 > URL: https://issues.apache.org/jira/browse/SPARK-26065 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maryann Xue >Priority: Major > > The existing query hint implementation relies on a logical plan node > {{ResolvedHint}} to store query hints in logical plans, and on {{Statistics}} > in physical plans. Since {{ResolvedHint}} is not really a logical operator > and can break the pattern matching for existing and future optimization > rules, it is a issue to the Optimizer as the old {{AnalysisBarrier}} to the > Analyzer. > Given the fact that all our query hints are either 1) a join hint, i.e., > broadcast hint; or 2) a re-partition hint, which is indeed an operator, we > only need to add a hint field on the {{Join}} plan and that will be a good > enough solution for current hint usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26065) Change query hint from a `LogicalPlan` to a field
[ https://issues.apache.org/jira/browse/SPARK-26065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687003#comment-16687003 ] Apache Spark commented on SPARK-26065: -- User 'maryannxue' has created a pull request for this issue: https://github.com/apache/spark/pull/23036 > Change query hint from a `LogicalPlan` to a field > - > > Key: SPARK-26065 > URL: https://issues.apache.org/jira/browse/SPARK-26065 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maryann Xue >Priority: Major > > The existing query hint implementation relies on a logical plan node > {{ResolvedHint}} to store query hints in logical plans, and on {{Statistics}} > in physical plans. Since {{ResolvedHint}} is not really a logical operator > and can break the pattern matching for existing and future optimization > rules, it is a issue to the Optimizer as the old {{AnalysisBarrier}} to the > Analyzer. > Given the fact that all our query hints are either 1) a join hint, i.e., > broadcast hint; or 2) a re-partition hint, which is indeed an operator, we > only need to add a hint field on the {{Join}} plan and that will be a good > enough solution for current hint usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26042) KafkaContinuousSourceTopicDeletionSuite may hang forever
[ https://issues.apache.org/jira/browse/SPARK-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26042. -- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 > KafkaContinuousSourceTopicDeletionSuite may hang forever > > > Key: SPARK-26042 > URL: https://issues.apache.org/jira/browse/SPARK-26042 > Project: Spark > Issue Type: Test > Components: Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > Saw the following thread dump in some build: > {code} > "stream execution thread for [id = 1c13482e-1edf-4b5c-b63a-d652738c8a48, > runId = 10667ce9-7eac-4cef-a525-f1bd08eb50f1]" #4406 daemon prio=5 os_prio=0 > tid=0x7fab1d3c5000 nid=0x7f4b waiting on condition [0x7fa96efcb000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070a904cf8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > ... > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:180) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:109) > - locked <0x00070a913ee8> (a > org.apache.spark.sql.execution.streaming.IncrementalExecution) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109) > at > org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:270) > at > org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:270) > ,,, > "pool-1-thread-1-ScalaTest-running-KafkaContinuousSourceTopicDeletionSuite" > #20 prio=5 os_prio=0 tid=0x7fabc4e78800 nid=0x23be waiting for monitor > entry [0x7fab3dbff000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100) > - waiting to lock <0x00070a913ee8> (a > org.apache.spark.sql.execution.streaming.IncrementalExecution) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:100) > at > org.apache.spark.sql.kafka010.KafkaContinuousSourceTopicDeletionSuite$$anonfun$3$$anonfun$apply$mcV$sp$12$$anonfun$apply$15.apply(KafkaContinuousSourceSuite.scala:210) > at > org.apache.spark.sql.kafka010.KafkaContinuousSourceTopicDeletionSuite$$anonfun$3$$anonfun$apply$mcV$sp$12$$anonfun$apply$15.apply(KafkaContinuousSourceSuite.scala:209) > ... > {code} > It hung forever because the test main thread was trying to access > `executedPlan` but the lock was held by the streaming thread. > This is a pretty common issue when using lazy vals as all lazy vals share the > same lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686907#comment-16686907 ] Ruslan Dautkhanov edited comment on SPARK-26041 at 11/14/18 5:45 PM: - thanks for checking this [~mgaido] just attached txt file that shows sequence of dataframe creation and last failing dataframe too All SparkSQL It always reproduces this issue for us. Let us know what you find out. was (Author: tagar): thank for checking this [~mgaido] just attached txt file that shows sequence of dataframe creation and last failing dataframe too All SparkSQL It always reproduces this issue for us. Let us know what you find out. > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > Attachments: SPARK-26041.txt > > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at >
[jira] [Comment Edited] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686907#comment-16686907 ] Ruslan Dautkhanov edited comment on SPARK-26041 at 11/14/18 5:45 PM: - thank for checking this [~mgaido] just attached txt file that shows sequence of dataframe creation and last failing dataframe too All SparkSQL It always reproduces this issue for us. Let us know what you find out. was (Author: tagar): thank for checking this [~mgaido] just attached sql that shows sequence of dataframe creation and last failing dataframe too All SparkSQL It always reproduces this issue for us. Let us know what you find out. > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > Attachments: SPARK-26041.txt > > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at >
[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686907#comment-16686907 ] Ruslan Dautkhanov commented on SPARK-26041: --- thank for checking this [~mgaido] just attached sql that shows sequence of dataframe creation and last failing dataframe too All SparkSQL It always reproduces this issue for us. Let us know what you find out. > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > Attachments: SPARK-26041.txt > > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:210) >
[jira] [Updated] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated SPARK-26041: -- Attachment: SPARK-26041.txt > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > Attachments: SPARK-26041.txt > > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:209) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at
[jira] [Resolved] (SPARK-23067) Allow for easier debugging of the docker container
[ https://issues.apache.org/jira/browse/SPARK-23067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-23067. Resolution: Duplicate This was added in SPARK-24534. Yay for searching before filing a new bug... > Allow for easier debugging of the docker container > -- > > Key: SPARK-23067 > URL: https://issues.apache.org/jira/browse/SPARK-23067 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Anirudh Ramanathan >Priority: Minor > > `docker run -it foxish/spark:v2.3.0 /bin/bash` fails because we don't accept > any command except (driver, executor and init). Consider piping the unknown > commands through when they're unknown. > It is still possible to do something like: > `docker run -it --entrypoint=/bin/bash foxish/spark:v2.3.0` now for debugging > but it's common to try and run a different command as specified above. Also > consider documenting how to debug/inspect the docker images. > [~vanzin] [~kimoonkim] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26065) Change query hint from a `LogicalPlan` to a field
Maryann Xue created SPARK-26065: --- Summary: Change query hint from a `LogicalPlan` to a field Key: SPARK-26065 URL: https://issues.apache.org/jira/browse/SPARK-26065 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maryann Xue The existing query hint implementation relies on a logical plan node {{ResolvedHint}} to store query hints in logical plans, and on {{Statistics}} in physical plans. Since {{ResolvedHint}} is not really a logical operator and can break the pattern matching for existing and future optimization rules, it is a issue to the Optimizer as the old {{AnalysisBarrier}} to the Analyzer. Given the fact that all our query hints are either 1) a join hint, i.e., broadcast hint; or 2) a re-partition hint, which is indeed an operator, we only need to add a hint field on the {{Join}} plan and that will be a good enough solution for current hint usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686870#comment-16686870 ] Marco Gaido commented on SPARK-26041: - No, it is not, for 2.3 we would need a dedicated fix. > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:209) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at
[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686857#comment-16686857 ] Marco Gaido commented on SPARK-26041: - Then it'd help if you could provide a reproducer for this... Thanks. > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:209) > at
[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686851#comment-16686851 ] Ruslan Dautkhanov commented on SPARK-26041: --- Thanks for referencing that jira [~mgaido] SPARK-26057 seems Spark 2.4 specific only from description . We see this problem in Spark 2.3.1 and in Spark 2.3.2 .. Can you check if https://github.com/apache/spark/pull/23035 is applicable to Spark 2.3 too? Thanks > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at >
[jira] [Updated] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated SPARK-26041: -- Affects Version/s: 2.3.0 2.3.1 > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:209) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at
[jira] [Commented] (SPARK-26041) catalyst cuts out some columns from dataframes: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute
[ https://issues.apache.org/jira/browse/SPARK-26041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686828#comment-16686828 ] Marco Gaido commented on SPARK-26041: - I think this may be a duplicate of SPARK-26057 (or the other way around). Could you please try if the fix for SPARK-26057 I just submitted fixes your case too? Thanks. > catalyst cuts out some columns from dataframes: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > - > > Key: SPARK-26041 > URL: https://issues.apache.org/jira/browse/SPARK-26041 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.2, 2.4.0 > Environment: Spark 2.3.2 > PySpark 2.7.15 + Hadoop 2.6 > When we materialize one of intermediate dataframes as a parquet table, and > read it back in, this error doesn't happen (exact same downflow queries ). > >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: catalyst, optimization > > There is a workflow with a number of group-by's, joins, `exists` and `in`s > between a set of dataframes. > We are getting following exception and the reason that the Catalyst cuts some > columns out of dataframes: > {noformat} > Unhandled error: , An error occurred > while calling o1187.cache. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 153 > in stage 2011.0 failed 4 times, most recent failure: Lost task 153.3 in stage > 2011.0 (TID 832340, pc1udatahad23, execut > or 153): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: part_code#56012 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:90) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:45) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.bind(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$lzycompute(BroadcastNestedLoopJoinExec.scala:87) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition(BroadcastNestedLoopJoinExec.scala:85) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4$$anonfun$6.apply(BroadcastNestedLoopJoinExec.scala:210) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:46) > at scala.collection.mutable.ArrayOps$ofRef.exists(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$5$$anonfun$apply$4.apply(BroadcastNestedLoopJoinExec.scala:210) > at >
[jira] [Resolved] (SPARK-25118) Need a solution to persist Spark application console outputs when running in shell/yarn client mode
[ https://issues.apache.org/jira/browse/SPARK-25118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-25118. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22504 [https://github.com/apache/spark/pull/22504] > Need a solution to persist Spark application console outputs when running in > shell/yarn client mode > --- > > Key: SPARK-25118 > URL: https://issues.apache.org/jira/browse/SPARK-25118 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Ankur Gupta >Assignee: Ankur Gupta >Priority: Major > Fix For: 3.0.0 > > > We execute Spark applications in YARN Client mode a lot of time. When we do > so the Spark Driver logs are printed to the console. > We need a solution to persist the console outputs for later usage. This can > be either for doing some troubleshooting or for some another log analysis. > Ideally, we would like to persist these along with Yarn logs (when > application is run in Yarn Client mode). Also, this has to be end-user > agnostic, so that the logs are available for later usage without requiring > the end-user to make some configuration changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25118) Need a solution to persist Spark application console outputs when running in shell/yarn client mode
[ https://issues.apache.org/jira/browse/SPARK-25118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-25118: -- Assignee: Ankur Gupta > Need a solution to persist Spark application console outputs when running in > shell/yarn client mode > --- > > Key: SPARK-25118 > URL: https://issues.apache.org/jira/browse/SPARK-25118 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Ankur Gupta >Assignee: Ankur Gupta >Priority: Major > > We execute Spark applications in YARN Client mode a lot of time. When we do > so the Spark Driver logs are printed to the console. > We need a solution to persist the console outputs for later usage. This can > be either for doing some troubleshooting or for some another log analysis. > Ideally, we would like to persist these along with Yarn logs (when > application is run in Yarn Client mode). Also, this has to be end-user > agnostic, so that the logs are available for later usage without requiring > the end-user to make some configuration changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26057) Table joining is broken in Spark 2.4
[ https://issues.apache.org/jira/browse/SPARK-26057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686715#comment-16686715 ] Apache Spark commented on SPARK-26057: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/23035 > Table joining is broken in Spark 2.4 > > > Key: SPARK-26057 > URL: https://issues.apache.org/jira/browse/SPARK-26057 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Pavel Parkhomenko >Priority: Major > > This sample works in spark-shell 2.3.1 and throws an exception in 2.4.0 > {code:java} > import java.util.Arrays.asList > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > spark.createDataFrame( > asList( > Row("1-1", "sp", 6), > Row("1-1", "pc", 5), > Row("1-2", "pc", 4), > Row("2-1", "sp", 3), > Row("2-2", "pc", 2), > Row("2-2", "sp", 1) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("n", IntegerType))) > ).createOrReplaceTempView("cc") > spark.createDataFrame( > asList( > Row("sp", 1), > Row("sp", 1), > Row("sp", 2), > Row("sp", 3), > Row("sp", 3), > Row("sp", 4), > Row("sp", 5), > Row("sp", 5), > Row("pc", 1), > Row("pc", 2), > Row("pc", 2), > Row("pc", 3), > Row("pc", 4), > Row("pc", 4), > Row("pc", 5) > ), > StructType(List(StructField("layout", StringType), StructField("ts", > IntegerType))) > ).createOrReplaceTempView("p") > spark.createDataFrame( > asList( > Row("1-1", "sp", 1), > Row("1-1", "sp", 2), > Row("1-1", "pc", 3), > Row("1-2", "pc", 3), > Row("1-2", "pc", 4), > Row("2-1", "sp", 4), > Row("2-1", "sp", 5), > Row("2-2", "pc", 6), > Row("2-2", "sp", 6) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("ts", IntegerType))) > ).createOrReplaceTempView("c") > spark.sql(""" > SELECT cc.id, cc.layout, count(*) as m > FROM cc > JOIN p USING(layout) > WHERE EXISTS(SELECT 1 FROM c WHERE c.id = cc.id AND c.layout = cc.layout > AND c.ts > p.ts) > GROUP BY cc.id, cc.layout > """).createOrReplaceTempView("pcc") > spark.sql("SELECT * FROM pcc ORDER BY id, layout").show > spark.sql(""" > SELECT cc.id, cc.layout, n, m > FROM cc > LEFT OUTER JOIN pcc ON pcc.id = cc.id AND pcc.layout = cc.layout > """).createOrReplaceTempView("k") > spark.sql("SELECT * FROM k ORDER BY id, layout").show > {code} > Actually I tried to catch another bug: similar calculations with joins and > nested queries have different results in Spark 2.3.1 and 2.4.0, but when I > tried to create a minimal example I received exception > {code:java} > java.lang.RuntimeException: Couldn't find id#0 in > [id#38,layout#39,ts#7,id#10,layout#11,ts#12] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26057) Table joining is broken in Spark 2.4
[ https://issues.apache.org/jira/browse/SPARK-26057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26057: Assignee: (was: Apache Spark) > Table joining is broken in Spark 2.4 > > > Key: SPARK-26057 > URL: https://issues.apache.org/jira/browse/SPARK-26057 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Pavel Parkhomenko >Priority: Major > > This sample works in spark-shell 2.3.1 and throws an exception in 2.4.0 > {code:java} > import java.util.Arrays.asList > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > spark.createDataFrame( > asList( > Row("1-1", "sp", 6), > Row("1-1", "pc", 5), > Row("1-2", "pc", 4), > Row("2-1", "sp", 3), > Row("2-2", "pc", 2), > Row("2-2", "sp", 1) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("n", IntegerType))) > ).createOrReplaceTempView("cc") > spark.createDataFrame( > asList( > Row("sp", 1), > Row("sp", 1), > Row("sp", 2), > Row("sp", 3), > Row("sp", 3), > Row("sp", 4), > Row("sp", 5), > Row("sp", 5), > Row("pc", 1), > Row("pc", 2), > Row("pc", 2), > Row("pc", 3), > Row("pc", 4), > Row("pc", 4), > Row("pc", 5) > ), > StructType(List(StructField("layout", StringType), StructField("ts", > IntegerType))) > ).createOrReplaceTempView("p") > spark.createDataFrame( > asList( > Row("1-1", "sp", 1), > Row("1-1", "sp", 2), > Row("1-1", "pc", 3), > Row("1-2", "pc", 3), > Row("1-2", "pc", 4), > Row("2-1", "sp", 4), > Row("2-1", "sp", 5), > Row("2-2", "pc", 6), > Row("2-2", "sp", 6) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("ts", IntegerType))) > ).createOrReplaceTempView("c") > spark.sql(""" > SELECT cc.id, cc.layout, count(*) as m > FROM cc > JOIN p USING(layout) > WHERE EXISTS(SELECT 1 FROM c WHERE c.id = cc.id AND c.layout = cc.layout > AND c.ts > p.ts) > GROUP BY cc.id, cc.layout > """).createOrReplaceTempView("pcc") > spark.sql("SELECT * FROM pcc ORDER BY id, layout").show > spark.sql(""" > SELECT cc.id, cc.layout, n, m > FROM cc > LEFT OUTER JOIN pcc ON pcc.id = cc.id AND pcc.layout = cc.layout > """).createOrReplaceTempView("k") > spark.sql("SELECT * FROM k ORDER BY id, layout").show > {code} > Actually I tried to catch another bug: similar calculations with joins and > nested queries have different results in Spark 2.3.1 and 2.4.0, but when I > tried to create a minimal example I received exception > {code:java} > java.lang.RuntimeException: Couldn't find id#0 in > [id#38,layout#39,ts#7,id#10,layout#11,ts#12] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26057) Table joining is broken in Spark 2.4
[ https://issues.apache.org/jira/browse/SPARK-26057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686710#comment-16686710 ] Apache Spark commented on SPARK-26057: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/23035 > Table joining is broken in Spark 2.4 > > > Key: SPARK-26057 > URL: https://issues.apache.org/jira/browse/SPARK-26057 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Pavel Parkhomenko >Priority: Major > > This sample works in spark-shell 2.3.1 and throws an exception in 2.4.0 > {code:java} > import java.util.Arrays.asList > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > spark.createDataFrame( > asList( > Row("1-1", "sp", 6), > Row("1-1", "pc", 5), > Row("1-2", "pc", 4), > Row("2-1", "sp", 3), > Row("2-2", "pc", 2), > Row("2-2", "sp", 1) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("n", IntegerType))) > ).createOrReplaceTempView("cc") > spark.createDataFrame( > asList( > Row("sp", 1), > Row("sp", 1), > Row("sp", 2), > Row("sp", 3), > Row("sp", 3), > Row("sp", 4), > Row("sp", 5), > Row("sp", 5), > Row("pc", 1), > Row("pc", 2), > Row("pc", 2), > Row("pc", 3), > Row("pc", 4), > Row("pc", 4), > Row("pc", 5) > ), > StructType(List(StructField("layout", StringType), StructField("ts", > IntegerType))) > ).createOrReplaceTempView("p") > spark.createDataFrame( > asList( > Row("1-1", "sp", 1), > Row("1-1", "sp", 2), > Row("1-1", "pc", 3), > Row("1-2", "pc", 3), > Row("1-2", "pc", 4), > Row("2-1", "sp", 4), > Row("2-1", "sp", 5), > Row("2-2", "pc", 6), > Row("2-2", "sp", 6) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("ts", IntegerType))) > ).createOrReplaceTempView("c") > spark.sql(""" > SELECT cc.id, cc.layout, count(*) as m > FROM cc > JOIN p USING(layout) > WHERE EXISTS(SELECT 1 FROM c WHERE c.id = cc.id AND c.layout = cc.layout > AND c.ts > p.ts) > GROUP BY cc.id, cc.layout > """).createOrReplaceTempView("pcc") > spark.sql("SELECT * FROM pcc ORDER BY id, layout").show > spark.sql(""" > SELECT cc.id, cc.layout, n, m > FROM cc > LEFT OUTER JOIN pcc ON pcc.id = cc.id AND pcc.layout = cc.layout > """).createOrReplaceTempView("k") > spark.sql("SELECT * FROM k ORDER BY id, layout").show > {code} > Actually I tried to catch another bug: similar calculations with joins and > nested queries have different results in Spark 2.3.1 and 2.4.0, but when I > tried to create a minimal example I received exception > {code:java} > java.lang.RuntimeException: Couldn't find id#0 in > [id#38,layout#39,ts#7,id#10,layout#11,ts#12] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26057) Table joining is broken in Spark 2.4
[ https://issues.apache.org/jira/browse/SPARK-26057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26057: Assignee: Apache Spark > Table joining is broken in Spark 2.4 > > > Key: SPARK-26057 > URL: https://issues.apache.org/jira/browse/SPARK-26057 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Pavel Parkhomenko >Assignee: Apache Spark >Priority: Major > > This sample works in spark-shell 2.3.1 and throws an exception in 2.4.0 > {code:java} > import java.util.Arrays.asList > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > spark.createDataFrame( > asList( > Row("1-1", "sp", 6), > Row("1-1", "pc", 5), > Row("1-2", "pc", 4), > Row("2-1", "sp", 3), > Row("2-2", "pc", 2), > Row("2-2", "sp", 1) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("n", IntegerType))) > ).createOrReplaceTempView("cc") > spark.createDataFrame( > asList( > Row("sp", 1), > Row("sp", 1), > Row("sp", 2), > Row("sp", 3), > Row("sp", 3), > Row("sp", 4), > Row("sp", 5), > Row("sp", 5), > Row("pc", 1), > Row("pc", 2), > Row("pc", 2), > Row("pc", 3), > Row("pc", 4), > Row("pc", 4), > Row("pc", 5) > ), > StructType(List(StructField("layout", StringType), StructField("ts", > IntegerType))) > ).createOrReplaceTempView("p") > spark.createDataFrame( > asList( > Row("1-1", "sp", 1), > Row("1-1", "sp", 2), > Row("1-1", "pc", 3), > Row("1-2", "pc", 3), > Row("1-2", "pc", 4), > Row("2-1", "sp", 4), > Row("2-1", "sp", 5), > Row("2-2", "pc", 6), > Row("2-2", "sp", 6) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("ts", IntegerType))) > ).createOrReplaceTempView("c") > spark.sql(""" > SELECT cc.id, cc.layout, count(*) as m > FROM cc > JOIN p USING(layout) > WHERE EXISTS(SELECT 1 FROM c WHERE c.id = cc.id AND c.layout = cc.layout > AND c.ts > p.ts) > GROUP BY cc.id, cc.layout > """).createOrReplaceTempView("pcc") > spark.sql("SELECT * FROM pcc ORDER BY id, layout").show > spark.sql(""" > SELECT cc.id, cc.layout, n, m > FROM cc > LEFT OUTER JOIN pcc ON pcc.id = cc.id AND pcc.layout = cc.layout > """).createOrReplaceTempView("k") > spark.sql("SELECT * FROM k ORDER BY id, layout").show > {code} > Actually I tried to catch another bug: similar calculations with joins and > nested queries have different results in Spark 2.3.1 and 2.4.0, but when I > tried to create a minimal example I received exception > {code:java} > java.lang.RuntimeException: Couldn't find id#0 in > [id#38,layout#39,ts#7,id#10,layout#11,ts#12] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26035) Break large streaming/tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686661#comment-16686661 ] Apache Spark commented on SPARK-26035: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/23034 > Break large streaming/tests.py files into smaller files > --- > > Key: SPARK-26035 > URL: https://issues.apache.org/jira/browse/SPARK-26035 > Project: Spark > Issue Type: Sub-task > Components: DStreams, PySpark >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25868) One part of Spark MLlib Kmean Logic Performance problem
[ https://issues.apache.org/jira/browse/SPARK-25868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25868. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22893 [https://github.com/apache/spark/pull/22893] > One part of Spark MLlib Kmean Logic Performance problem > --- > > Key: SPARK-25868 > URL: https://issues.apache.org/jira/browse/SPARK-25868 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.3.2 >Reporter: Liang Li >Assignee: Liang Li >Priority: Minor > Fix For: 3.0.0 > > > In function fastSquaredDistance, there is a low performance logic: > the sqDist = Vectors.sqdist(v1, v2) is better than sqDist = sumSquaredNorm - > 2.0 * dot(v1, v2) in calculation performance > So get rid of the low performance login in function fastSquaredDistance. > More test(End-to-End, function) situation can be found in > https://github.com/apache/spark/pull/22893 > Already update a patch #22893 for merge > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25868) One part of Spark MLlib Kmean Logic Performance problem
[ https://issues.apache.org/jira/browse/SPARK-25868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-25868: - Assignee: Liang Li > One part of Spark MLlib Kmean Logic Performance problem > --- > > Key: SPARK-25868 > URL: https://issues.apache.org/jira/browse/SPARK-25868 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.3.2 >Reporter: Liang Li >Assignee: Liang Li >Priority: Minor > Fix For: 3.0.0 > > > In function fastSquaredDistance, there is a low performance logic: > the sqDist = Vectors.sqdist(v1, v2) is better than sqDist = sumSquaredNorm - > 2.0 * dot(v1, v2) in calculation performance > So get rid of the low performance login in function fastSquaredDistance. > More test(End-to-End, function) situation can be found in > https://github.com/apache/spark/pull/22893 > Already update a patch #22893 for merge > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686677#comment-16686677 ] Apache Spark commented on SPARK-26054: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/23035 > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686671#comment-16686671 ] Apache Spark commented on SPARK-26054: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/23035 > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26035) Break large streaming/tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26035: Assignee: (was: Apache Spark) > Break large streaming/tests.py files into smaller files > --- > > Key: SPARK-26035 > URL: https://issues.apache.org/jira/browse/SPARK-26035 > Project: Spark > Issue Type: Sub-task > Components: DStreams, PySpark >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26035) Break large streaming/tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26035: Assignee: Apache Spark > Break large streaming/tests.py files into smaller files > --- > > Key: SPARK-26035 > URL: https://issues.apache.org/jira/browse/SPARK-26035 > Project: Spark > Issue Type: Sub-task > Components: DStreams, PySpark >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23831) Add org.apache.derby to IsolatedClientLoader
[ https://issues.apache.org/jira/browse/SPARK-23831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686640#comment-16686640 ] Hyukjin Kwon commented on SPARK-23831: -- [~marmbrus], what did you make you came here? did reverting this actually break something? > Add org.apache.derby to IsolatedClientLoader > > > Key: SPARK-23831 > URL: https://issues.apache.org/jira/browse/SPARK-23831 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > Add org.apache.derby to IsolatedClientLoader,otherwise it may throw an > exception: > {noformat} > [info] Cause: java.sql.SQLException: Failed to start database 'metastore_db' > with class loader > org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@2439ab23, see > the next exception for details. > [info] at > org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) > [info] at > org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) > [info] at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source) > [info] at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown > Source) > [info] at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source) > [info] at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source) > {noformat} > How to reproduce: > {noformat} > sed 's/HiveExternalCatalogSuite/HiveExternalCatalog2Suite/g' > sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala > > > sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalog2Suite.scala > build/sbt -Phive "hive/test-only *.HiveExternalCatalogSuite > *.HiveExternalCatalog2Suite" > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26064) Unable to fetch jar from remote repo while running spark-submit on kubernetes
Bala Bharath Reddy Resapu created SPARK-26064: - Summary: Unable to fetch jar from remote repo while running spark-submit on kubernetes Key: SPARK-26064 URL: https://issues.apache.org/jira/browse/SPARK-26064 Project: Spark Issue Type: Question Components: Kubernetes Affects Versions: 2.3.2 Reporter: Bala Bharath Reddy Resapu I am trying to run spark on kubernetes with a docker image. My requirement is to download the jar from the external repo while running spark-submit. I am able to download the jar using wget in the container but it doesn't work when inputting in the spark-submit command. I am not packaging the jar with docker image. It works fine when I input the jar file inside the docker image. ./bin/spark-submit \ --master k8s://[https://ip:port|https://ipport/] \ --deploy-mode cluster \ --name test3 \ --class hello \ --conf spark.kubernetes.container.image.pullSecrets=abcd \ --conf spark.kubernetes.container.image=spark:h2.0 \ [https://devops.com/artifactory/local/testing/testing_2.11/h|https://bala.bharath.reddy.resapu%40ibm.com:akcp5bcbktykg2ti28sju4gtebsqwkg2mqkaf9w6g5rdbo3iwrwx7qb1m5dokgd54hdru2...@na.artifactory.swg-devops.com/artifactory/txo-cedp-garage-artifacts-sbt-local/testing/testing_2.11/arithmetic.jar]ello.jar -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-26054: Affects Version/s: (was: 2.4.0) 2.2.0 > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido updated SPARK-26054: Component/s: (was: Spark Core) SQL > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-26054. - Resolution: Cannot Reproduce > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686395#comment-16686395 ] Marco Gaido commented on SPARK-26054: - Then the affected version is 2.2.0, not 2.4.0. I am updating this. I'll also close this ticket as it is fixed in the current version. Please fill more carefully the JIRA next time. Thanks. > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686386#comment-16686386 ] Jaya Krishna commented on SPARK-26054: -- Are you not seeing the issue even with BigDecimal data type? I am using the embedded Spark in Zeppelin. In our product we use Spark 2.2.0.. May be the issue is fixed in later version of Spark. I will check with the latest spark release. Thanks for the quick response. > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686378#comment-16686378 ] Marco Gaido commented on SPARK-26054: - Yes, sorry, I forgot to copy its definition. It is: {code} case class AA(id: String, amount: BigDecimal) {code} > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26063) CatalystDataToAvro gives "UnresolvedException: Invalid call to dataType on unresolved object" when requested for numberedTreeString
Jacek Laskowski created SPARK-26063: --- Summary: CatalystDataToAvro gives "UnresolvedException: Invalid call to dataType on unresolved object" when requested for numberedTreeString Key: SPARK-26063 URL: https://issues.apache.org/jira/browse/SPARK-26063 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Jacek Laskowski The following gives {{org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'id}}: {code:java} // ./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.0 scala> spark.version res0: String = 2.4.0 import org.apache.spark.sql.avro._ val q = spark.range(1).withColumn("to_avro_id", to_avro('id)) val logicalPlan = q.queryExecution.logical scala> logicalPlan.expressions.drop(1).head.numberedTreeString org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'id at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105) at org.apache.spark.sql.avro.CatalystDataToAvro.simpleString(CatalystDataToAvro.scala:56) at org.apache.spark.sql.catalyst.expressions.Expression.verboseString(Expression.scala:233) at org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:548) at org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:569) at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:472) at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:469) at org.apache.spark.sql.catalyst.trees.TreeNode.numberedTreeString(TreeNode.scala:483) ... 51 elided{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686373#comment-16686373 ] Jaya Krishna commented on SPARK-26054: -- Sorry for the confusion. I actually joined screen shots of several sections of the zeppelin workbook and I changed the value in between. Attached the correct picture now. Have you tried by defining the case class AA as "case class AA (id: String, amount: BigDecimal)"? > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaya Krishna updated SPARK-26054: - Attachment: (was: sparksql-rounding.png) > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaya Krishna updated SPARK-26054: - Attachment: sparksql-rounding.png > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26035) Break large streaming/tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26035: - Target Version/s: 3.0.0 Fix Version/s: (was: 3.0.0) > Break large streaming/tests.py files into smaller files > --- > > Key: SPARK-26035 > URL: https://issues.apache.org/jira/browse/SPARK-26035 > Project: Spark > Issue Type: Sub-task > Components: DStreams, PySpark >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26062) Rename spark-avro external module to spark-sql-avro (to match spark-sql-kafka)
Jacek Laskowski created SPARK-26062: --- Summary: Rename spark-avro external module to spark-sql-avro (to match spark-sql-kafka) Key: SPARK-26062 URL: https://issues.apache.org/jira/browse/SPARK-26062 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Jacek Laskowski Given the name of {{spark-sql-kafka}} external module it seems appropriate (and consistent) to rename {{spark-avro}} external module to {{spark-sql-avro}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26036) Break large tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26036: Assignee: (was: Apache Spark) > Break large tests.py files into smaller files > - > > Key: SPARK-26036 > URL: https://issues.apache.org/jira/browse/SPARK-26036 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Spark Core >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686333#comment-16686333 ] Marco Gaido commented on SPARK-26054: - {code} val data = Seq(AA("0101", "2500.98".toDouble), AA("0102", "5690.9876".toDouble)) val rdd = sparkContext.parallelize(data); val df = rdd.toDF df.select($"id", $"amount", round($"amount", 2)).show() {code} returns {code} ++-++ | id| amount|round(amount, 2)| ++-++ |0101| 2500.98| 2500.98| |0102|5690.9876| 5690.99| ++-++ {code} Please check what you are doing...your example seems pretty strange: the values returned in the double example are very different from the string values... > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26036) Break large tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686328#comment-16686328 ] Apache Spark commented on SPARK-26036: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/23033 > Break large tests.py files into smaller files > - > > Key: SPARK-26036 > URL: https://issues.apache.org/jira/browse/SPARK-26036 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Spark Core >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26036) Break large tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26036: Assignee: Apache Spark > Break large tests.py files into smaller files > - > > Key: SPARK-26036 > URL: https://issues.apache.org/jira/browse/SPARK-26036 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Spark Core >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26036) Break large tests.py files into smaller files
[ https://issues.apache.org/jira/browse/SPARK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686327#comment-16686327 ] Apache Spark commented on SPARK-26036: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/23033 > Break large tests.py files into smaller files > - > > Key: SPARK-26036 > URL: https://issues.apache.org/jira/browse/SPARK-26036 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Spark Core >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686318#comment-16686318 ] Jaya Krishna commented on SPARK-26054: -- Hmm. There seems to be an issue if we start with a RDD - convert it to DataFrame and then do these operations. Can you try as follows: case class AA (id: String, amount: BigDecimal) val data = Seq(AA("0101", "2500.98".toDouble), AA("0102", "5690.9876".toDouble)) var rdd = sc.parallelize(data); //val df = Seq(AA("0101", "2500.98".toDouble), AA("0102", "5690.9876".toDouble)).toDF val df = rdd.toDF df.select($"id", $"amount", round($"amount", 2)).show() > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26061) Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
[ https://issues.apache.org/jira/browse/SPARK-26061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26061: Assignee: Apache Spark > Reduce the number of unused UnsafeRowWriters created in whole-stage codegen > --- > > Key: SPARK-26061 > URL: https://issues.apache.org/jira/browse/SPARK-26061 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Kris Mok >Assignee: Apache Spark >Priority: Trivial > > Reduce the number of unused UnsafeRowWriters created in whole-stage generated > code. > They come from the CodegenSupport.consume() calling prepareRowVar(), which > uses GenerateUnsafeProjection.createCode() and registers an UnsafeRowWriter > mutable state, regardless of whether or not the downstream (parent) operator > will use the rowVar or not. > Even when the downstream doConsume function doesn't use the rowVar (i.e. > doesn't put row.code as a part of this operator's codegen template), the > registered UnsafeRowWriter stays there, which makes the init function of the > generated code a bit bloated. > This ticket doesn't track the root issue, but makes it slightly less painful: > when the doConsume function is split out, the prepareRowVar() function is > called twice, so it's double the pain of unused UnsafeRowWriters. This fix > simply moves the original call to prepareRowVar() down into the doConsume > split/no-split branch so that we're back to just 1x the pain. > To fix the root issue, something that allows the CodegenSupport operators to > indicate whether or not they're going to use the rowVar would be needed. > That's a much more elaborate change so I'd like to just make a minor fix > first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26061) Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
[ https://issues.apache.org/jira/browse/SPARK-26061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686299#comment-16686299 ] Apache Spark commented on SPARK-26061: -- User 'rednaxelafx' has created a pull request for this issue: https://github.com/apache/spark/pull/23032 > Reduce the number of unused UnsafeRowWriters created in whole-stage codegen > --- > > Key: SPARK-26061 > URL: https://issues.apache.org/jira/browse/SPARK-26061 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Kris Mok >Priority: Trivial > > Reduce the number of unused UnsafeRowWriters created in whole-stage generated > code. > They come from the CodegenSupport.consume() calling prepareRowVar(), which > uses GenerateUnsafeProjection.createCode() and registers an UnsafeRowWriter > mutable state, regardless of whether or not the downstream (parent) operator > will use the rowVar or not. > Even when the downstream doConsume function doesn't use the rowVar (i.e. > doesn't put row.code as a part of this operator's codegen template), the > registered UnsafeRowWriter stays there, which makes the init function of the > generated code a bit bloated. > This ticket doesn't track the root issue, but makes it slightly less painful: > when the doConsume function is split out, the prepareRowVar() function is > called twice, so it's double the pain of unused UnsafeRowWriters. This fix > simply moves the original call to prepareRowVar() down into the doConsume > split/no-split branch so that we're back to just 1x the pain. > To fix the root issue, something that allows the CodegenSupport operators to > indicate whether or not they're going to use the rowVar would be needed. > That's a much more elaborate change so I'd like to just make a minor fix > first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26061) Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
[ https://issues.apache.org/jira/browse/SPARK-26061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686300#comment-16686300 ] Apache Spark commented on SPARK-26061: -- User 'rednaxelafx' has created a pull request for this issue: https://github.com/apache/spark/pull/23032 > Reduce the number of unused UnsafeRowWriters created in whole-stage codegen > --- > > Key: SPARK-26061 > URL: https://issues.apache.org/jira/browse/SPARK-26061 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Kris Mok >Priority: Trivial > > Reduce the number of unused UnsafeRowWriters created in whole-stage generated > code. > They come from the CodegenSupport.consume() calling prepareRowVar(), which > uses GenerateUnsafeProjection.createCode() and registers an UnsafeRowWriter > mutable state, regardless of whether or not the downstream (parent) operator > will use the rowVar or not. > Even when the downstream doConsume function doesn't use the rowVar (i.e. > doesn't put row.code as a part of this operator's codegen template), the > registered UnsafeRowWriter stays there, which makes the init function of the > generated code a bit bloated. > This ticket doesn't track the root issue, but makes it slightly less painful: > when the doConsume function is split out, the prepareRowVar() function is > called twice, so it's double the pain of unused UnsafeRowWriters. This fix > simply moves the original call to prepareRowVar() down into the doConsume > split/no-split branch so that we're back to just 1x the pain. > To fix the root issue, something that allows the CodegenSupport operators to > indicate whether or not they're going to use the rowVar would be needed. > That's a much more elaborate change so I'd like to just make a minor fix > first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26061) Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
[ https://issues.apache.org/jira/browse/SPARK-26061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26061: Assignee: (was: Apache Spark) > Reduce the number of unused UnsafeRowWriters created in whole-stage codegen > --- > > Key: SPARK-26061 > URL: https://issues.apache.org/jira/browse/SPARK-26061 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Kris Mok >Priority: Trivial > > Reduce the number of unused UnsafeRowWriters created in whole-stage generated > code. > They come from the CodegenSupport.consume() calling prepareRowVar(), which > uses GenerateUnsafeProjection.createCode() and registers an UnsafeRowWriter > mutable state, regardless of whether or not the downstream (parent) operator > will use the rowVar or not. > Even when the downstream doConsume function doesn't use the rowVar (i.e. > doesn't put row.code as a part of this operator's codegen template), the > registered UnsafeRowWriter stays there, which makes the init function of the > generated code a bit bloated. > This ticket doesn't track the root issue, but makes it slightly less painful: > when the doConsume function is split out, the prepareRowVar() function is > called twice, so it's double the pain of unused UnsafeRowWriters. This fix > simply moves the original call to prepareRowVar() down into the doConsume > split/no-split branch so that we're back to just 1x the pain. > To fix the root issue, something that allows the CodegenSupport operators to > indicate whether or not they're going to use the rowVar would be needed. > That's a much more elaborate change so I'd like to just make a minor fix > first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26061) Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
Kris Mok created SPARK-26061: Summary: Reduce the number of unused UnsafeRowWriters created in whole-stage codegen Key: SPARK-26061 URL: https://issues.apache.org/jira/browse/SPARK-26061 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0 Reporter: Kris Mok Reduce the number of unused UnsafeRowWriters created in whole-stage generated code. They come from the CodegenSupport.consume() calling prepareRowVar(), which uses GenerateUnsafeProjection.createCode() and registers an UnsafeRowWriter mutable state, regardless of whether or not the downstream (parent) operator will use the rowVar or not. Even when the downstream doConsume function doesn't use the rowVar (i.e. doesn't put row.code as a part of this operator's codegen template), the registered UnsafeRowWriter stays there, which makes the init function of the generated code a bit bloated. This ticket doesn't track the root issue, but makes it slightly less painful: when the doConsume function is split out, the prepareRowVar() function is called twice, so it's double the pain of unused UnsafeRowWriters. This fix simply moves the original call to prepareRowVar() down into the doConsume split/no-split branch so that we're back to just 1x the pain. To fix the root issue, something that allows the CodegenSupport operators to indicate whether or not they're going to use the rowVar would be needed. That's a much more elaborate change so I'd like to just make a minor fix first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26054) Creating a computed column applying the spark sql rounding on a column of type decimal affects the orginal column as well.
[ https://issues.apache.org/jira/browse/SPARK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686293#comment-16686293 ] Marco Gaido commented on SPARK-26054: - I cannot reproduce this: {code} val df = Seq(AA("0101", "2500.98".toDouble), AA("0102", "5690.9876".toDouble)).toDF df.select($"id", $"amount", round($"amount", 2)).show() {code} returned {code} ++++ | id| amount|round(amount, 2)| ++++ |0101|2500.9800...| 2500.98| |0102|5690.9876...| 5690.99| ++++ {code} Moreover in the image you posted the values are pretty weird... I mean also the double ones are very different from what is represented in the strings... > Creating a computed column applying the spark sql rounding on a column of > type decimal affects the orginal column as well. > -- > > Key: SPARK-26054 > URL: https://issues.apache.org/jira/browse/SPARK-26054 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Jaya Krishna >Priority: Minor > Attachments: sparksql-rounding.png > > > When a computed column that rounds the value is added to a data frame, it is > affecting the value of the original column as well. The behavior depends on > the database column type - If it is either float or double, the result is as > expected - the original column will have its own formatting and the computed > column will be rounded as per the rounding definition specified for it. > However if the column type in the database is decimal, then Spark SQL is > applying the rounding even to the original column. Attached image has the > spark sql code that shows the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26060) Track SparkConf entries and make SET command reject such entries.
[ https://issues.apache.org/jira/browse/SPARK-26060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26060: Assignee: Apache Spark > Track SparkConf entries and make SET command reject such entries. > - > > Key: SPARK-26060 > URL: https://issues.apache.org/jira/browse/SPARK-26060 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > Currently the {{SET}} command works without any warnings even if the > specified key is for {{SparkConf}} entries and it has no effect because the > command does not update {{SparkConf}}, but the behavior might confuse users. > We should track {{SparkConf}} entries and make the command reject for such > entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26060) Track SparkConf entries and make SET command reject such entries.
[ https://issues.apache.org/jira/browse/SPARK-26060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686233#comment-16686233 ] Apache Spark commented on SPARK-26060: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/23031 > Track SparkConf entries and make SET command reject such entries. > - > > Key: SPARK-26060 > URL: https://issues.apache.org/jira/browse/SPARK-26060 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Takuya Ueshin >Priority: Major > > Currently the {{SET}} command works without any warnings even if the > specified key is for {{SparkConf}} entries and it has no effect because the > command does not update {{SparkConf}}, but the behavior might confuse users. > We should track {{SparkConf}} entries and make the command reject for such > entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26060) Track SparkConf entries and make SET command reject such entries.
[ https://issues.apache.org/jira/browse/SPARK-26060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26060: Assignee: (was: Apache Spark) > Track SparkConf entries and make SET command reject such entries. > - > > Key: SPARK-26060 > URL: https://issues.apache.org/jira/browse/SPARK-26060 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Takuya Ueshin >Priority: Major > > Currently the {{SET}} command works without any warnings even if the > specified key is for {{SparkConf}} entries and it has no effect because the > command does not update {{SparkConf}}, but the behavior might confuse users. > We should track {{SparkConf}} entries and make the command reject for such > entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26060) Track SparkConf entries and make SET command reject such entries.
Takuya Ueshin created SPARK-26060: - Summary: Track SparkConf entries and make SET command reject such entries. Key: SPARK-26060 URL: https://issues.apache.org/jira/browse/SPARK-26060 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 2.4.0 Reporter: Takuya Ueshin Currently the {{SET}} command works without any warnings even if the specified key is for {{SparkConf}} entries and it has no effect because the command does not update {{SparkConf}}, but the behavior might confuse users. We should track {{SparkConf}} entries and make the command reject for such entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org