[jira] [Assigned] (SPARK-42722) Python Connect def schema() should not cache the schema
[ https://issues.apache.org/jira/browse/SPARK-42722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42722: Assignee: Rui Wang (was: Apache Spark) > Python Connect def schema() should not cache the schema > > > Key: SPARK-42722 > URL: https://issues.apache.org/jira/browse/SPARK-42722 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42721) Add an Interceptor to log RPCs in connect-server
[ https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698057#comment-17698057 ] Apache Spark commented on SPARK-42721: -- User 'rangadi' has created a pull request for this issue: https://github.com/apache/spark/pull/40342 > Add an Interceptor to log RPCs in connect-server > > > Key: SPARK-42721 > URL: https://issues.apache.org/jira/browse/SPARK-42721 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > > It would be useful to be able to log RPC to connect server during > development. It makes simpler to see the flow of messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42721) Add an Interceptor to log RPCs in connect-server
[ https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42721: Assignee: (was: Apache Spark) > Add an Interceptor to log RPCs in connect-server > > > Key: SPARK-42721 > URL: https://issues.apache.org/jira/browse/SPARK-42721 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > > It would be useful to be able to log RPC to connect server during > development. It makes simpler to see the flow of messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42721) Add an Interceptor to log RPCs in connect-server
[ https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42721: Assignee: Apache Spark > Add an Interceptor to log RPCs in connect-server > > > Key: SPARK-42721 > URL: https://issues.apache.org/jira/browse/SPARK-42721 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Assignee: Apache Spark >Priority: Major > Fix For: 3.5.0 > > > It would be useful to be able to log RPC to connect server during > development. It makes simpler to see the flow of messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42721) Add an Interceptor to log RPCs in connect-server
[ https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698055#comment-17698055 ] Apache Spark commented on SPARK-42721: -- User 'rangadi' has created a pull request for this issue: https://github.com/apache/spark/pull/40342 > Add an Interceptor to log RPCs in connect-server > > > Key: SPARK-42721 > URL: https://issues.apache.org/jira/browse/SPARK-42721 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > > It would be useful to be able to log RPC to connect server during > development. It makes simpler to see the flow of messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file
[ https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697999#comment-17697999 ] Apache Spark commented on SPARK-42715: -- User 'chong0929' has created a pull request for this issue: https://github.com/apache/spark/pull/40341 > NegativeArraySizeException by too many datas read from ORC file > --- > > Key: SPARK-42715 > URL: https://issues.apache.org/jira/browse/SPARK-42715 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: XiaoLong Wu >Priority: Minor > > If need more friendly exception msg about how to avoid this exception? Like > when we catch this expetion, told user can reduce the value about > spark.sql.orc.columnarReaderBatchSize; > In the current version, for batch reading of orc files, we use the function > OrcColumnarBatchReader.nextBatch() to do this and depends on > [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in > ORC relevant code is as follows: > {code:java} > private static byte[] commonReadByteArrays(InStream stream, IntegerReader > lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) throws IOException { > // Read lengths > scratchlcv.isRepeating = result.isRepeating; > scratchlcv.noNulls = result.noNulls; > scratchlcv.isNull = result.isNull; // Notice we are replacing the isNull > vector here... > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); > int totalLength = 0; > if (!scratchlcv.isRepeating) { > for (int i = 0; i < batchSize; i++) { > if (!scratchlcv.isNull[i]) { > totalLength += (int) scratchlcv.vector[i]; > } > } > } else { > if (!scratchlcv.isNull[0]) { > totalLength = (int) (batchSize * scratchlcv.vector[0]); > } > } > // Read all the strings for this batch > byte[] allBytes = new byte[totalLength]; > int offset = 0; > int len = totalLength; > while (len > 0) { > int bytesRead = stream.read(allBytes, offset, len); > if (bytesRead < 0) { > throw new EOFException("Can't finish byte read from " + stream); > } > len -= bytesRead; > offset += bytesRead; > } > return allBytes; > } {code} > As shown above, totalLength as a Long type param is used to mark the data > size. If the data size too big to over max_int, converting to int will lead > to value overflow and throws the following exception: > {code:java} > Caused by: java.lang.NegativeArraySizeException > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962) > at > org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65) > at > org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100) > at > org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77) > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274) > ... 20 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file
[ https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42715: Assignee: Apache Spark > NegativeArraySizeException by too many datas read from ORC file > --- > > Key: SPARK-42715 > URL: https://issues.apache.org/jira/browse/SPARK-42715 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: XiaoLong Wu >Assignee: Apache Spark >Priority: Minor > > If need more friendly exception msg about how to avoid this exception? Like > when we catch this expetion, told user can reduce the value about > spark.sql.orc.columnarReaderBatchSize; > In the current version, for batch reading of orc files, we use the function > OrcColumnarBatchReader.nextBatch() to do this and depends on > [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in > ORC relevant code is as follows: > {code:java} > private static byte[] commonReadByteArrays(InStream stream, IntegerReader > lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) throws IOException { > // Read lengths > scratchlcv.isRepeating = result.isRepeating; > scratchlcv.noNulls = result.noNulls; > scratchlcv.isNull = result.isNull; // Notice we are replacing the isNull > vector here... > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); > int totalLength = 0; > if (!scratchlcv.isRepeating) { > for (int i = 0; i < batchSize; i++) { > if (!scratchlcv.isNull[i]) { > totalLength += (int) scratchlcv.vector[i]; > } > } > } else { > if (!scratchlcv.isNull[0]) { > totalLength = (int) (batchSize * scratchlcv.vector[0]); > } > } > // Read all the strings for this batch > byte[] allBytes = new byte[totalLength]; > int offset = 0; > int len = totalLength; > while (len > 0) { > int bytesRead = stream.read(allBytes, offset, len); > if (bytesRead < 0) { > throw new EOFException("Can't finish byte read from " + stream); > } > len -= bytesRead; > offset += bytesRead; > } > return allBytes; > } {code} > As shown above, totalLength as a Long type param is used to mark the data > size. If the data size too big to over max_int, converting to int will lead > to value overflow and throws the following exception: > {code:java} > Caused by: java.lang.NegativeArraySizeException > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962) > at > org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65) > at > org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100) > at > org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77) > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274) > ... 20 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file
[ https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42715: Assignee: (was: Apache Spark) > NegativeArraySizeException by too many datas read from ORC file > --- > > Key: SPARK-42715 > URL: https://issues.apache.org/jira/browse/SPARK-42715 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: XiaoLong Wu >Priority: Minor > > If need more friendly exception msg about how to avoid this exception? Like > when we catch this expetion, told user can reduce the value about > spark.sql.orc.columnarReaderBatchSize; > In the current version, for batch reading of orc files, we use the function > OrcColumnarBatchReader.nextBatch() to do this and depends on > [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in > ORC relevant code is as follows: > {code:java} > private static byte[] commonReadByteArrays(InStream stream, IntegerReader > lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) throws IOException { > // Read lengths > scratchlcv.isRepeating = result.isRepeating; > scratchlcv.noNulls = result.noNulls; > scratchlcv.isNull = result.isNull; // Notice we are replacing the isNull > vector here... > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); > int totalLength = 0; > if (!scratchlcv.isRepeating) { > for (int i = 0; i < batchSize; i++) { > if (!scratchlcv.isNull[i]) { > totalLength += (int) scratchlcv.vector[i]; > } > } > } else { > if (!scratchlcv.isNull[0]) { > totalLength = (int) (batchSize * scratchlcv.vector[0]); > } > } > // Read all the strings for this batch > byte[] allBytes = new byte[totalLength]; > int offset = 0; > int len = totalLength; > while (len > 0) { > int bytesRead = stream.read(allBytes, offset, len); > if (bytesRead < 0) { > throw new EOFException("Can't finish byte read from " + stream); > } > len -= bytesRead; > offset += bytesRead; > } > return allBytes; > } {code} > As shown above, totalLength as a Long type param is used to mark the data > size. If the data size too big to over max_int, converting to int will lead > to value overflow and throws the following exception: > {code:java} > Caused by: java.lang.NegativeArraySizeException > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962) > at > org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65) > at > org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100) > at > org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77) > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274) > ... 20 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42701) Add the try_aes_decrypt() function
[ https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42701: Assignee: Max Gekk (was: Apache Spark) > Add the try_aes_decrypt() function > -- > > Key: SPARK-42701 > URL: https://issues.apache.org/jira/browse/SPARK-42701 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: starter > > Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an > exception when it faces to a column value which it cannot decrypt. So, if a > column contains bad and good input, it is impossible to decrypt even good > input. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42701) Add the try_aes_decrypt() function
[ https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42701: Assignee: Apache Spark (was: Max Gekk) > Add the try_aes_decrypt() function > -- > > Key: SPARK-42701 > URL: https://issues.apache.org/jira/browse/SPARK-42701 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Labels: starter > > Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an > exception when it faces to a column value which it cannot decrypt. So, if a > column contains bad and good input, it is impossible to decrypt even good > input. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42701) Add the try_aes_decrypt() function
[ https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697923#comment-17697923 ] Apache Spark commented on SPARK-42701: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/40340 > Add the try_aes_decrypt() function > -- > > Key: SPARK-42701 > URL: https://issues.apache.org/jira/browse/SPARK-42701 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: starter > > Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an > exception when it faces to a column value which it cannot decrypt. So, if a > column contains bad and good input, it is impossible to decrypt even good > input. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`
[ https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42719: Assignee: (was: Apache Spark) > `MapOutputTracker#getMapLocation` should respect > `spark.shuffle.reduceLocality.enabled` > > > Key: SPARK-42719 > URL: https://issues.apache.org/jira/browse/SPARK-42719 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: He Qi >Priority: Major > > Discuss as [https://github.com/apache/spark/pull/40307] > {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the > very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} > (conceptually). > This logic is pushed into MapOutputTracker though - and > {{getPreferredLocationsForShuffle}} honors > {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not. > So the fix would be to fix {{getMapLocation}} to honor the parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`
[ https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697891#comment-17697891 ] Apache Spark commented on SPARK-42719: -- User 'jerqi' has created a pull request for this issue: https://github.com/apache/spark/pull/40339 > `MapOutputTracker#getMapLocation` should respect > `spark.shuffle.reduceLocality.enabled` > > > Key: SPARK-42719 > URL: https://issues.apache.org/jira/browse/SPARK-42719 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: He Qi >Priority: Major > > Discuss as [https://github.com/apache/spark/pull/40307] > {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the > very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} > (conceptually). > This logic is pushed into MapOutputTracker though - and > {{getPreferredLocationsForShuffle}} honors > {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not. > So the fix would be to fix {{getMapLocation}} to honor the parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`
[ https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42719: Assignee: Apache Spark > `MapOutputTracker#getMapLocation` should respect > `spark.shuffle.reduceLocality.enabled` > > > Key: SPARK-42719 > URL: https://issues.apache.org/jira/browse/SPARK-42719 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: He Qi >Assignee: Apache Spark >Priority: Major > > Discuss as [https://github.com/apache/spark/pull/40307] > {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the > very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} > (conceptually). > This logic is pushed into MapOutputTracker though - and > {{getPreferredLocationsForShuffle}} honors > {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not. > So the fix would be to fix {{getMapLocation}} to honor the parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42718) Upgrade rocksdbjni to 7.10.2
[ https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42718: Assignee: Apache Spark > Upgrade rocksdbjni to 7.10.2 > > > Key: SPARK-42718 > URL: https://issues.apache.org/jira/browse/SPARK-42718 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://github.com/facebook/rocksdb/releases/tag/v7.10.2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42718) Upgrade rocksdbjni to 7.10.2
[ https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42718: Assignee: (was: Apache Spark) > Upgrade rocksdbjni to 7.10.2 > > > Key: SPARK-42718 > URL: https://issues.apache.org/jira/browse/SPARK-42718 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/facebook/rocksdb/releases/tag/v7.10.2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42718) Upgrade rocksdbjni to 7.10.2
[ https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697875#comment-17697875 ] Apache Spark commented on SPARK-42718: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40337 > Upgrade rocksdbjni to 7.10.2 > > > Key: SPARK-42718 > URL: https://issues.apache.org/jira/browse/SPARK-42718 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/facebook/rocksdb/releases/tag/v7.10.2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42706) List the error class to user-facing documentation.
[ https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42706: Assignee: (was: Apache Spark) > List the error class to user-facing documentation. > -- > > Key: SPARK-42706 > URL: https://issues.apache.org/jira/browse/SPARK-42706 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We need to have an error class list to user facing documents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42706) List the error class to user-facing documentation.
[ https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42706: Assignee: Apache Spark > List the error class to user-facing documentation. > -- > > Key: SPARK-42706 > URL: https://issues.apache.org/jira/browse/SPARK-42706 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We need to have an error class list to user facing documents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42706) List the error class to user-facing documentation.
[ https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697863#comment-17697863 ] Apache Spark commented on SPARK-42706: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40336 > List the error class to user-facing documentation. > -- > > Key: SPARK-42706 > URL: https://issues.apache.org/jira/browse/SPARK-42706 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We need to have an error class list to user facing documents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32
[ https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697850#comment-17697850 ] Apache Spark commented on SPARK-42717: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40335 > Upgrade mysql-connector-java from 8.0.31 to 8.0.32 > -- > > Key: SPARK-42717 > URL: https://issues.apache.org/jira/browse/SPARK-42717 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32
[ https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697851#comment-17697851 ] Apache Spark commented on SPARK-42717: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40335 > Upgrade mysql-connector-java from 8.0.31 to 8.0.32 > -- > > Key: SPARK-42717 > URL: https://issues.apache.org/jira/browse/SPARK-42717 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32
[ https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42717: Assignee: Apache Spark > Upgrade mysql-connector-java from 8.0.31 to 8.0.32 > -- > > Key: SPARK-42717 > URL: https://issues.apache.org/jira/browse/SPARK-42717 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32
[ https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42717: Assignee: (was: Apache Spark) > Upgrade mysql-connector-java from 8.0.31 to 8.0.32 > -- > > Key: SPARK-42717 > URL: https://issues.apache.org/jira/browse/SPARK-42717 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition
[ https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697847#comment-17697847 ] Apache Spark commented on SPARK-42716: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/40334 > DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per > partition > -- > > Key: SPARK-42716 > URL: https://issues.apache.org/jira/browse/SPARK-42716 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1 >Reporter: Enrico Minack >Priority: Major > > From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as > {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if > multiple keys belong to a partition. > With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the > partition information reported through {{SupportsReportPartitioning}} is > considered by catalyst. But this limits the number of keys per partition to 1. > Spark should continue to support the more general situation of > {{KeyGroupedPartitioning}} with multiple keys per partition, like > {{HashPartitioning}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition
[ https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42716: Assignee: (was: Apache Spark) > DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per > partition > -- > > Key: SPARK-42716 > URL: https://issues.apache.org/jira/browse/SPARK-42716 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1 >Reporter: Enrico Minack >Priority: Major > > From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as > {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if > multiple keys belong to a partition. > With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the > partition information reported through {{SupportsReportPartitioning}} is > considered by catalyst. But this limits the number of keys per partition to 1. > Spark should continue to support the more general situation of > {{KeyGroupedPartitioning}} with multiple keys per partition, like > {{HashPartitioning}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition
[ https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42716: Assignee: Apache Spark > DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per > partition > -- > > Key: SPARK-42716 > URL: https://issues.apache.org/jira/browse/SPARK-42716 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1 >Reporter: Enrico Minack >Assignee: Apache Spark >Priority: Major > > From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as > {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if > multiple keys belong to a partition. > With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the > partition information reported through {{SupportsReportPartitioning}} is > considered by catalyst. But this limits the number of keys per partition to 1. > Spark should continue to support the more general situation of > {{KeyGroupedPartitioning}} with multiple keys per partition, like > {{HashPartitioning}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition
[ https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697845#comment-17697845 ] Apache Spark commented on SPARK-42716: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/40334 > DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per > partition > -- > > Key: SPARK-42716 > URL: https://issues.apache.org/jira/browse/SPARK-42716 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1 >Reporter: Enrico Minack >Priority: Major > > From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as > {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if > multiple keys belong to a partition. > With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the > partition information reported through {{SupportsReportPartitioning}} is > considered by catalyst. But this limits the number of keys per partition to 1. > Spark should continue to support the more general situation of > {{KeyGroupedPartitioning}} with multiple keys per partition, like > {{HashPartitioning}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42623) parameter markers not blocked in DDL
[ https://issues.apache.org/jira/browse/SPARK-42623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42623: Assignee: Apache Spark > parameter markers not blocked in DDL > > > Key: SPARK-42623 > URL: https://issues.apache.org/jira/browse/SPARK-42623 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Apache Spark >Priority: Major > > The parameterized query code does not block DDL statements from referencing > parameter markers. > E.g. a > > {code:java} > scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + > :later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' > HOUR", "x" -> "15.0")).show() > ++ > || > ++ > ++ > {code} > It appears we have some protection that fails us when the view is invoked: > > {code:java} > scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> > "INTERVAL'3' HOUR", "x" -> "15.0")).show() > org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the > unbound parameter: `later`. Please, fix `args` and provide a mapping of the > parameter to a SQL literal.; line 1 pos 29 > {code} > Right now I think affected are: > * DEFAULT definition > * VIEW definition > but any other future standard expression popping up is at risk, such as SQL > Functions, or GENERATED COLUMN. > CREATE TABLE AS is debatable, since it it executes the query at definition > only. > For simplicity I propose to block the feature from ANY DDL statement (CREATE, > ALTER). > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42623) parameter markers not blocked in DDL
[ https://issues.apache.org/jira/browse/SPARK-42623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42623: Assignee: (was: Apache Spark) > parameter markers not blocked in DDL > > > Key: SPARK-42623 > URL: https://issues.apache.org/jira/browse/SPARK-42623 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > The parameterized query code does not block DDL statements from referencing > parameter markers. > E.g. a > > {code:java} > scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + > :later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' > HOUR", "x" -> "15.0")).show() > ++ > || > ++ > ++ > {code} > It appears we have some protection that fails us when the view is invoked: > > {code:java} > scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> > "INTERVAL'3' HOUR", "x" -> "15.0")).show() > org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the > unbound parameter: `later`. Please, fix `args` and provide a mapping of the > parameter to a SQL literal.; line 1 pos 29 > {code} > Right now I think affected are: > * DEFAULT definition > * VIEW definition > but any other future standard expression popping up is at risk, such as SQL > Functions, or GENERATED COLUMN. > CREATE TABLE AS is debatable, since it it executes the query at definition > only. > For simplicity I propose to block the feature from ANY DDL statement (CREATE, > ALTER). > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42623) parameter markers not blocked in DDL
[ https://issues.apache.org/jira/browse/SPARK-42623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697796#comment-17697796 ] Apache Spark commented on SPARK-42623: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40333 > parameter markers not blocked in DDL > > > Key: SPARK-42623 > URL: https://issues.apache.org/jira/browse/SPARK-42623 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > The parameterized query code does not block DDL statements from referencing > parameter markers. > E.g. a > > {code:java} > scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + > :later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' > HOUR", "x" -> "15.0")).show() > ++ > || > ++ > ++ > {code} > It appears we have some protection that fails us when the view is invoked: > > {code:java} > scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> > "INTERVAL'3' HOUR", "x" -> "15.0")).show() > org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the > unbound parameter: `later`. Please, fix `args` and provide a mapping of the > parameter to a SQL literal.; line 1 pos 29 > {code} > Right now I think affected are: > * DEFAULT definition > * VIEW definition > but any other future standard expression popping up is at risk, such as SQL > Functions, or GENERATED COLUMN. > CREATE TABLE AS is debatable, since it it executes the query at definition > only. > For simplicity I propose to block the feature from ANY DDL statement (CREATE, > ALTER). > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42702) Support parameterized CTE
[ https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42702: Assignee: Max Gekk (was: Apache Spark) > Support parameterized CTE > - > > Key: SPARK-42702 > URL: https://issues.apache.org/jira/browse/SPARK-42702 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Support named parameters in named common table expressions (CTE). At the > moment, such queries failed: > {code:java} > CREATE TABLE tbl(namespace STRING) USING parquet > INSERT INTO tbl SELECT 'abc' > WITH transitions AS ( > SELECT * FROM tbl WHERE namespace = :namespace > ) SELECT * FROM transitions {code} > w/ the following error: > {code:java} > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, false at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42702) Support parameterized CTE
[ https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42702: Assignee: Apache Spark (was: Max Gekk) > Support parameterized CTE > - > > Key: SPARK-42702 > URL: https://issues.apache.org/jira/browse/SPARK-42702 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Support named parameters in named common table expressions (CTE). At the > moment, such queries failed: > {code:java} > CREATE TABLE tbl(namespace STRING) USING parquet > INSERT INTO tbl SELECT 'abc' > WITH transitions AS ( > SELECT * FROM tbl WHERE namespace = :namespace > ) SELECT * FROM transitions {code} > w/ the following error: > {code:java} > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, false at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42702) Support parameterized CTE
[ https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697794#comment-17697794 ] Apache Spark commented on SPARK-42702: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40333 > Support parameterized CTE > - > > Key: SPARK-42702 > URL: https://issues.apache.org/jira/browse/SPARK-42702 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Support named parameters in named common table expressions (CTE). At the > moment, such queries failed: > {code:java} > CREATE TABLE tbl(namespace STRING) USING parquet > INSERT INTO tbl SELECT 'abc' > WITH transitions AS ( > SELECT * FROM tbl WHERE namespace = :namespace > ) SELECT * FROM transitions {code} > w/ the following error: > {code:java} > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, false at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42690) Implement CSV/JSON parsing funcions
[ https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42690: Assignee: Apache Spark > Implement CSV/JSON parsing funcions > --- > > Key: SPARK-42690 > URL: https://issues.apache.org/jira/browse/SPARK-42690 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > Implement the following two methods in DataFrameReader: > > > {code:java} > /** > * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines > * text format or newline-delimited JSON) and returns the result as a > `DataFrame`. > * > * Unless the schema is specified using `schema` function, this function goes > through the > * input once to determine the input schema. > * > * @param jsonDataset input Dataset with one JSON object per record > * @since 3.4.0 > */ > def json(jsonDataset: Dataset[String]): DataFrame > /** > * Loads an `Dataset[String]` storing CSV rows and returns the result as a > `DataFrame`. > * > * If the schema is not specified using `schema` function and `inferSchema` > option is enabled, > * this function goes through the input once to determine the input schema. > * > * If the schema is not specified using `schema` function and `inferSchema` > option is disabled, > * it determines the columns as string types and it reads only the first line > to determine the > * names and the number of fields. > * > * If the enforceSchema is set to `false`, only the CSV header in the first > line is checked > * to conform specified or inferred schema. > * > * @note if `header` option is set to `true` when calling this API, all lines > same with > * the header will be removed if exists. > * > * @param csvDataset input Dataset with one CSV row per record > * @since 3.4.0 > */ > def csv(csvDataset: Dataset[String]): DataFrame > {code} > > For this we need a new message. We cannot use project because we don't know > the schema upfront. > > {code:java} > message Parse { > // (Required) Input relation to Parse. The input is expected to have single > text column. > Relation input = 1; > // (Required) The expected format of the text. > ParseFormat format = 2; > enum ParseFormat { > PARSE_FORMAT_UNSPECIFIED = 0; > PARSE_FORMAT_CSV = 1; > PARSE_FORMAT_JSON = 2; > } > } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42690) Implement CSV/JSON parsing funcions
[ https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42690: Assignee: (was: Apache Spark) > Implement CSV/JSON parsing funcions > --- > > Key: SPARK-42690 > URL: https://issues.apache.org/jira/browse/SPARK-42690 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement the following two methods in DataFrameReader: > > > {code:java} > /** > * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines > * text format or newline-delimited JSON) and returns the result as a > `DataFrame`. > * > * Unless the schema is specified using `schema` function, this function goes > through the > * input once to determine the input schema. > * > * @param jsonDataset input Dataset with one JSON object per record > * @since 3.4.0 > */ > def json(jsonDataset: Dataset[String]): DataFrame > /** > * Loads an `Dataset[String]` storing CSV rows and returns the result as a > `DataFrame`. > * > * If the schema is not specified using `schema` function and `inferSchema` > option is enabled, > * this function goes through the input once to determine the input schema. > * > * If the schema is not specified using `schema` function and `inferSchema` > option is disabled, > * it determines the columns as string types and it reads only the first line > to determine the > * names and the number of fields. > * > * If the enforceSchema is set to `false`, only the CSV header in the first > line is checked > * to conform specified or inferred schema. > * > * @note if `header` option is set to `true` when calling this API, all lines > same with > * the header will be removed if exists. > * > * @param csvDataset input Dataset with one CSV row per record > * @since 3.4.0 > */ > def csv(csvDataset: Dataset[String]): DataFrame > {code} > > For this we need a new message. We cannot use project because we don't know > the schema upfront. > > {code:java} > message Parse { > // (Required) Input relation to Parse. The input is expected to have single > text column. > Relation input = 1; > // (Required) The expected format of the text. > ParseFormat format = 2; > enum ParseFormat { > PARSE_FORMAT_UNSPECIFIED = 0; > PARSE_FORMAT_CSV = 1; > PARSE_FORMAT_JSON = 2; > } > } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42690) Implement CSV/JSON parsing funcions
[ https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697786#comment-17697786 ] Apache Spark commented on SPARK-42690: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40332 > Implement CSV/JSON parsing funcions > --- > > Key: SPARK-42690 > URL: https://issues.apache.org/jira/browse/SPARK-42690 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement the following two methods in DataFrameReader: > > > {code:java} > /** > * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines > * text format or newline-delimited JSON) and returns the result as a > `DataFrame`. > * > * Unless the schema is specified using `schema` function, this function goes > through the > * input once to determine the input schema. > * > * @param jsonDataset input Dataset with one JSON object per record > * @since 3.4.0 > */ > def json(jsonDataset: Dataset[String]): DataFrame > /** > * Loads an `Dataset[String]` storing CSV rows and returns the result as a > `DataFrame`. > * > * If the schema is not specified using `schema` function and `inferSchema` > option is enabled, > * this function goes through the input once to determine the input schema. > * > * If the schema is not specified using `schema` function and `inferSchema` > option is disabled, > * it determines the columns as string types and it reads only the first line > to determine the > * names and the number of fields. > * > * If the enforceSchema is set to `false`, only the CSV header in the first > line is checked > * to conform specified or inferred schema. > * > * @note if `header` option is set to `true` when calling this API, all lines > same with > * the header will be removed if exists. > * > * @param csvDataset input Dataset with one CSV row per record > * @since 3.4.0 > */ > def csv(csvDataset: Dataset[String]): DataFrame > {code} > > For this we need a new message. We cannot use project because we don't know > the schema upfront. > > {code:java} > message Parse { > // (Required) Input relation to Parse. The input is expected to have single > text column. > Relation input = 1; > // (Required) The expected format of the text. > ParseFormat format = 2; > enum ParseFormat { > PARSE_FORMAT_UNSPECIFIED = 0; > PARSE_FORMAT_CSV = 1; > PARSE_FORMAT_JSON = 2; > } > } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42713) Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
[ https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697756#comment-17697756 ] Apache Spark commented on SPARK-42713: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40331 > Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference > > > Key: SPARK-42713 > URL: https://issues.apache.org/jira/browse/SPARK-42713 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42713) Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
[ https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42713: Assignee: Apache Spark > Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference > > > Key: SPARK-42713 > URL: https://issues.apache.org/jira/browse/SPARK-42713 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42713) Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
[ https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42713: Assignee: (was: Apache Spark) > Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference > > > Key: SPARK-42713 > URL: https://issues.apache.org/jira/browse/SPARK-42713 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42713) Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
[ https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697755#comment-17697755 ] Apache Spark commented on SPARK-42713: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40331 > Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference > > > Key: SPARK-42713 > URL: https://issues.apache.org/jira/browse/SPARK-42713 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow
[ https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42712: Assignee: Apache Spark > Improve docstring of mapInPandas and mapInArrow > --- > > Key: SPARK-42712 > URL: https://issues.apache.org/jira/browse/SPARK-42712 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > We'd better call out they are not scalar. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow
[ https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697752#comment-17697752 ] Apache Spark commented on SPARK-42712: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40330 > Improve docstring of mapInPandas and mapInArrow > --- > > Key: SPARK-42712 > URL: https://issues.apache.org/jira/browse/SPARK-42712 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > We'd better call out they are not scalar. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow
[ https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42712: Assignee: (was: Apache Spark) > Improve docstring of mapInPandas and mapInArrow > --- > > Key: SPARK-42712 > URL: https://issues.apache.org/jira/browse/SPARK-42712 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > We'd better call out they are not scalar. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42710) Rename FrameMap proto to MapPartitions
[ https://issues.apache.org/jira/browse/SPARK-42710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42710: Assignee: Apache Spark > Rename FrameMap proto to MapPartitions > -- > > Key: SPARK-42710 > URL: https://issues.apache.org/jira/browse/SPARK-42710 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > For readability. > Frame Map API refers to mapInPandas and mapInArrow, which are equivalent to > MapPartitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42710) Rename FrameMap proto to MapPartitions
[ https://issues.apache.org/jira/browse/SPARK-42710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42710: Assignee: (was: Apache Spark) > Rename FrameMap proto to MapPartitions > -- > > Key: SPARK-42710 > URL: https://issues.apache.org/jira/browse/SPARK-42710 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > For readability. > Frame Map API refers to mapInPandas and mapInArrow, which are equivalent to > MapPartitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42710) Rename FrameMap proto to MapPartitions
[ https://issues.apache.org/jira/browse/SPARK-42710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697748#comment-17697748 ] Apache Spark commented on SPARK-42710: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40329 > Rename FrameMap proto to MapPartitions > -- > > Key: SPARK-42710 > URL: https://issues.apache.org/jira/browse/SPARK-42710 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > For readability. > Frame Map API refers to mapInPandas and mapInArrow, which are equivalent to > MapPartitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42709) Do not rely on __file__
[ https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42709: Assignee: (was: Apache Spark) > Do not rely on __file__ > --- > > Key: SPARK-42709 > URL: https://issues.apache.org/jira/browse/SPARK-42709 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We have a lot of places using __file__ which is actually optional. We > shouldn't reply on them -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42709) Do not rely on __file__
[ https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42709: Assignee: Apache Spark > Do not rely on __file__ > --- > > Key: SPARK-42709 > URL: https://issues.apache.org/jira/browse/SPARK-42709 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > We have a lot of places using __file__ which is actually optional. We > shouldn't reply on them -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42709) Do not rely on __file__
[ https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697734#comment-17697734 ] Apache Spark commented on SPARK-42709: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40328 > Do not rely on __file__ > --- > > Key: SPARK-42709 > URL: https://issues.apache.org/jira/browse/SPARK-42709 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We have a lot of places using __file__ which is actually optional. We > shouldn't reply on them -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42266) Local mode should work with IPython
[ https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697727#comment-17697727 ] Apache Spark commented on SPARK-42266: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40327 > Local mode should work with IPython > --- > > Key: SPARK-42266 > URL: https://issues.apache.org/jira/browse/SPARK-42266 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > {code:java} > (spark_dev) ➜ spark git:(master) bin/pyspark --remote "local[*]" > Python 3.9.15 (main, Nov 24 2022, 08:28:41) > Type 'copyright', 'credits' or 'license' for more information > IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help. > /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: > Failed to initialize Spark session. > warnings.warn("Failed to initialize Spark session.") > Traceback (most recent call last): > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in > > spark = SparkSession.builder.getOrCreate() > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line > 429, in getOrCreate > from pyspark.sql.connect.session import SparkSession as RemoteSparkSession > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line > 21, in > from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", > line 35, in > import pandas > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", > line 29, in > from pyspark.pandas.missing.general_functions import > MissingPandasLikeGeneralFunctions > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", > line 34, in > require_minimum_pandas_version() > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", > line 37, in require_minimum_pandas_version > if LooseVersion(pandas.__version__) < > LooseVersion(minimum_pandas_version): > AttributeError: partially initialized module 'pandas' has no attribute > '__version__' (most likely due to a circular import) > [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file > /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py: > --- > AttributeErrorTraceback (most recent call last) > File ~/Dev/spark/python/pyspark/shell.py:40 > 38 try: > 39 # Creates pyspark.sql.connect.SparkSession. > ---> 40 spark = SparkSession.builder.getOrCreate() > 41 except Exception: > File ~/Dev/spark/python/pyspark/sql/session.py:429, in > SparkSession.Builder.getOrCreate(self) > 428 with SparkContext._lock: > --> 429 from pyspark.sql.connect.session import SparkSession as > RemoteSparkSession > 431 if ( > 432 SparkContext._active_spark_context is None > 433 and SparkSession._instantiatedSession is None > 434 ): > File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21 > 18 """Currently Spark Connect is very experimental and the APIs to > interact with > 19 Spark through this API are can be changed at any time without > warning.""" > ---> 21 from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > 22 from pyspark.sql.pandas.utils import ( > 23 require_minimum_pandas_version, > 24 require_minimum_pyarrow_version, > 25 require_minimum_grpc_version, > 26 ) > File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35 > 34 import random > ---> 35 import pandas > 36 import json > File ~/Dev/spark/python/pyspark/pandas/__init__.py:29 > 27 from typing import Any > ---> 29 from pyspark.pandas.missing.general_functions import > MissingPandasLikeGeneralFunctions > 30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars > File ~/Dev/spark/python/pyspark/pandas/__init__.py:34 > 33 try: > ---> 34 require_minimum_pandas_version() > 35 require_minimum_pyarrow_version() > File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in > require_minimum_pandas_version() > 34 raise ImportError( > 35 "Pandas >= %s must be installed; however, " "it was not > found." % minimum_pandas_version > 36 ) from raised_error > ---> 37 if LooseVersion(pandas.__version__) < > LooseVersion(minimum_pandas_version): > 38 raise ImportError( > 39 "Pandas >= %s must be installed; however, " > 40 "your version was %s." % (minimum_pandas_version, > pandas.__version__) > 41 ) >
[jira] [Assigned] (SPARK-42266) Local mode should work with IPython
[ https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42266: Assignee: Apache Spark > Local mode should work with IPython > --- > > Key: SPARK-42266 > URL: https://issues.apache.org/jira/browse/SPARK-42266 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > > {code:java} > (spark_dev) ➜ spark git:(master) bin/pyspark --remote "local[*]" > Python 3.9.15 (main, Nov 24 2022, 08:28:41) > Type 'copyright', 'credits' or 'license' for more information > IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help. > /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: > Failed to initialize Spark session. > warnings.warn("Failed to initialize Spark session.") > Traceback (most recent call last): > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in > > spark = SparkSession.builder.getOrCreate() > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line > 429, in getOrCreate > from pyspark.sql.connect.session import SparkSession as RemoteSparkSession > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line > 21, in > from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", > line 35, in > import pandas > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", > line 29, in > from pyspark.pandas.missing.general_functions import > MissingPandasLikeGeneralFunctions > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", > line 34, in > require_minimum_pandas_version() > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", > line 37, in require_minimum_pandas_version > if LooseVersion(pandas.__version__) < > LooseVersion(minimum_pandas_version): > AttributeError: partially initialized module 'pandas' has no attribute > '__version__' (most likely due to a circular import) > [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file > /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py: > --- > AttributeErrorTraceback (most recent call last) > File ~/Dev/spark/python/pyspark/shell.py:40 > 38 try: > 39 # Creates pyspark.sql.connect.SparkSession. > ---> 40 spark = SparkSession.builder.getOrCreate() > 41 except Exception: > File ~/Dev/spark/python/pyspark/sql/session.py:429, in > SparkSession.Builder.getOrCreate(self) > 428 with SparkContext._lock: > --> 429 from pyspark.sql.connect.session import SparkSession as > RemoteSparkSession > 431 if ( > 432 SparkContext._active_spark_context is None > 433 and SparkSession._instantiatedSession is None > 434 ): > File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21 > 18 """Currently Spark Connect is very experimental and the APIs to > interact with > 19 Spark through this API are can be changed at any time without > warning.""" > ---> 21 from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > 22 from pyspark.sql.pandas.utils import ( > 23 require_minimum_pandas_version, > 24 require_minimum_pyarrow_version, > 25 require_minimum_grpc_version, > 26 ) > File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35 > 34 import random > ---> 35 import pandas > 36 import json > File ~/Dev/spark/python/pyspark/pandas/__init__.py:29 > 27 from typing import Any > ---> 29 from pyspark.pandas.missing.general_functions import > MissingPandasLikeGeneralFunctions > 30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars > File ~/Dev/spark/python/pyspark/pandas/__init__.py:34 > 33 try: > ---> 34 require_minimum_pandas_version() > 35 require_minimum_pyarrow_version() > File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in > require_minimum_pandas_version() > 34 raise ImportError( > 35 "Pandas >= %s must be installed; however, " "it was not > found." % minimum_pandas_version > 36 ) from raised_error > ---> 37 if LooseVersion(pandas.__version__) < > LooseVersion(minimum_pandas_version): > 38 raise ImportError( > 39 "Pandas >= %s must be installed; however, " > 40 "your version was %s." % (minimum_pandas_version, > pandas.__version__) > 41 ) > AttributeError: partially initialized module 'pandas' has no attribute >
[jira] [Assigned] (SPARK-42266) Local mode should work with IPython
[ https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42266: Assignee: (was: Apache Spark) > Local mode should work with IPython > --- > > Key: SPARK-42266 > URL: https://issues.apache.org/jira/browse/SPARK-42266 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > {code:java} > (spark_dev) ➜ spark git:(master) bin/pyspark --remote "local[*]" > Python 3.9.15 (main, Nov 24 2022, 08:28:41) > Type 'copyright', 'credits' or 'license' for more information > IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help. > /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: > Failed to initialize Spark session. > warnings.warn("Failed to initialize Spark session.") > Traceback (most recent call last): > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in > > spark = SparkSession.builder.getOrCreate() > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line > 429, in getOrCreate > from pyspark.sql.connect.session import SparkSession as RemoteSparkSession > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line > 21, in > from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", > line 35, in > import pandas > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", > line 29, in > from pyspark.pandas.missing.general_functions import > MissingPandasLikeGeneralFunctions > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", > line 34, in > require_minimum_pandas_version() > File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", > line 37, in require_minimum_pandas_version > if LooseVersion(pandas.__version__) < > LooseVersion(minimum_pandas_version): > AttributeError: partially initialized module 'pandas' has no attribute > '__version__' (most likely due to a circular import) > [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file > /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py: > --- > AttributeErrorTraceback (most recent call last) > File ~/Dev/spark/python/pyspark/shell.py:40 > 38 try: > 39 # Creates pyspark.sql.connect.SparkSession. > ---> 40 spark = SparkSession.builder.getOrCreate() > 41 except Exception: > File ~/Dev/spark/python/pyspark/sql/session.py:429, in > SparkSession.Builder.getOrCreate(self) > 428 with SparkContext._lock: > --> 429 from pyspark.sql.connect.session import SparkSession as > RemoteSparkSession > 431 if ( > 432 SparkContext._active_spark_context is None > 433 and SparkSession._instantiatedSession is None > 434 ): > File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21 > 18 """Currently Spark Connect is very experimental and the APIs to > interact with > 19 Spark through this API are can be changed at any time without > warning.""" > ---> 21 from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > 22 from pyspark.sql.pandas.utils import ( > 23 require_minimum_pandas_version, > 24 require_minimum_pyarrow_version, > 25 require_minimum_grpc_version, > 26 ) > File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35 > 34 import random > ---> 35 import pandas > 36 import json > File ~/Dev/spark/python/pyspark/pandas/__init__.py:29 > 27 from typing import Any > ---> 29 from pyspark.pandas.missing.general_functions import > MissingPandasLikeGeneralFunctions > 30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars > File ~/Dev/spark/python/pyspark/pandas/__init__.py:34 > 33 try: > ---> 34 require_minimum_pandas_version() > 35 require_minimum_pyarrow_version() > File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in > require_minimum_pandas_version() > 34 raise ImportError( > 35 "Pandas >= %s must be installed; however, " "it was not > found." % minimum_pandas_version > 36 ) from raised_error > ---> 37 if LooseVersion(pandas.__version__) < > LooseVersion(minimum_pandas_version): > 38 raise ImportError( > 39 "Pandas >= %s must be installed; however, " > 40 "your version was %s." % (minimum_pandas_version, > pandas.__version__) > 41 ) > AttributeError: partially initialized module 'pandas' has no attribute > '__version__' (most likely
[jira] [Assigned] (SPARK-42708) The generated protobuf java file is too large
[ https://issues.apache.org/jira/browse/SPARK-42708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42708: Assignee: (was: Apache Spark) > The generated protobuf java file is too large > - > > Key: SPARK-42708 > URL: https://issues.apache.org/jira/browse/SPARK-42708 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Jia Fan >Priority: Trivial > > Our project used generated protobuf java file too large so can't be index by > IDEA. So I can't run program with IDEA. The way to fix this is change IDEA > idea.max.intellisense.filesize value to 1. I can't find how to fix this > in project README before I google. So I want to tell other new guys in Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42708) The generated protobuf java file is too large
[ https://issues.apache.org/jira/browse/SPARK-42708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697719#comment-17697719 ] Apache Spark commented on SPARK-42708: -- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/40326 > The generated protobuf java file is too large > - > > Key: SPARK-42708 > URL: https://issues.apache.org/jira/browse/SPARK-42708 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Jia Fan >Priority: Trivial > > Our project used generated protobuf java file too large so can't be index by > IDEA. So I can't run program with IDEA. The way to fix this is change IDEA > idea.max.intellisense.filesize value to 1. I can't find how to fix this > in project README before I google. So I want to tell other new guys in Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42708) The generated protobuf java file is too large
[ https://issues.apache.org/jira/browse/SPARK-42708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42708: Assignee: Apache Spark > The generated protobuf java file is too large > - > > Key: SPARK-42708 > URL: https://issues.apache.org/jira/browse/SPARK-42708 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Jia Fan >Assignee: Apache Spark >Priority: Trivial > > Our project used generated protobuf java file too large so can't be index by > IDEA. So I can't run program with IDEA. The way to fix this is change IDEA > idea.max.intellisense.filesize value to 1. I can't find how to fix this > in project README before I google. So I want to tell other new guys in Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42707) Remove experimental warning in developer documentation
[ https://issues.apache.org/jira/browse/SPARK-42707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697717#comment-17697717 ] Apache Spark commented on SPARK-42707: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40325 > Remove experimental warning in developer documentation > -- > > Key: SPARK-42707 > URL: https://issues.apache.org/jira/browse/SPARK-42707 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > diff --git a/connector/connect/README.md b/connector/connect/README.md > index 6567daf5504..dfe49cea3df 100644 > --- a/connector/connect/README.md > +++ b/connector/connect/README.md > @@ -1,8 +1,5 @@ > # Spark Connect > -**Spark Connect is a strictly experimental feature and under heavy > development. > -All APIs should be considered volatile and should not be used in > production.** > - > This module contains the implementation of Spark Connect which is a logical > plan > facade for the implementation in Spark. Spark Connect is directly integrated > into the build > of Spark. > diff --git a/python/pyspark/sql/connect/__init__.py > b/python/pyspark/sql/connect/__init__.py > index 9bd4513db22..8b5d30e214c 100644 > --- a/python/pyspark/sql/connect/__init__.py > +++ b/python/pyspark/sql/connect/__init__.py > @@ -15,5 +15,4 @@ > # limitations under the License. > # > -"""Currently Spark Connect is very experimental and the APIs to interact with > -Spark through this API are can be changed at any time without warning.""" > +"""Spark Connect cleint""" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42707) Remove experimental warning in developer documentation
[ https://issues.apache.org/jira/browse/SPARK-42707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42707: Assignee: Apache Spark > Remove experimental warning in developer documentation > -- > > Key: SPARK-42707 > URL: https://issues.apache.org/jira/browse/SPARK-42707 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > diff --git a/connector/connect/README.md b/connector/connect/README.md > index 6567daf5504..dfe49cea3df 100644 > --- a/connector/connect/README.md > +++ b/connector/connect/README.md > @@ -1,8 +1,5 @@ > # Spark Connect > -**Spark Connect is a strictly experimental feature and under heavy > development. > -All APIs should be considered volatile and should not be used in > production.** > - > This module contains the implementation of Spark Connect which is a logical > plan > facade for the implementation in Spark. Spark Connect is directly integrated > into the build > of Spark. > diff --git a/python/pyspark/sql/connect/__init__.py > b/python/pyspark/sql/connect/__init__.py > index 9bd4513db22..8b5d30e214c 100644 > --- a/python/pyspark/sql/connect/__init__.py > +++ b/python/pyspark/sql/connect/__init__.py > @@ -15,5 +15,4 @@ > # limitations under the License. > # > -"""Currently Spark Connect is very experimental and the APIs to interact with > -Spark through this API are can be changed at any time without warning.""" > +"""Spark Connect cleint""" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42707) Remove experimental warning in developer documentation
[ https://issues.apache.org/jira/browse/SPARK-42707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42707: Assignee: (was: Apache Spark) > Remove experimental warning in developer documentation > -- > > Key: SPARK-42707 > URL: https://issues.apache.org/jira/browse/SPARK-42707 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > diff --git a/connector/connect/README.md b/connector/connect/README.md > index 6567daf5504..dfe49cea3df 100644 > --- a/connector/connect/README.md > +++ b/connector/connect/README.md > @@ -1,8 +1,5 @@ > # Spark Connect > -**Spark Connect is a strictly experimental feature and under heavy > development. > -All APIs should be considered volatile and should not be used in > production.** > - > This module contains the implementation of Spark Connect which is a logical > plan > facade for the implementation in Spark. Spark Connect is directly integrated > into the build > of Spark. > diff --git a/python/pyspark/sql/connect/__init__.py > b/python/pyspark/sql/connect/__init__.py > index 9bd4513db22..8b5d30e214c 100644 > --- a/python/pyspark/sql/connect/__init__.py > +++ b/python/pyspark/sql/connect/__init__.py > @@ -15,5 +15,4 @@ > # limitations under the License. > # > -"""Currently Spark Connect is very experimental and the APIs to interact with > -Spark through this API are can be changed at any time without warning.""" > +"""Spark Connect cleint""" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42496) Introduction Spark Connect at main page.
[ https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697699#comment-17697699 ] Apache Spark commented on SPARK-42496: -- User 'allanf-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40324 > Introduction Spark Connect at main page. > > > Key: SPARK-42496 > URL: https://issues.apache.org/jira/browse/SPARK-42496 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should document the introduction of Spark Connect at PySpark main > documentation page to give a summary to users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42705) SparkSession.sql doesn't return values from commands.
[ https://issues.apache.org/jira/browse/SPARK-42705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42705: Assignee: Apache Spark > SparkSession.sql doesn't return values from commands. > - > > Key: SPARK-42705 > URL: https://issues.apache.org/jira/browse/SPARK-42705 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > {code:python} > >>> spark.sql("show functions").show() > ++ > |function| > ++ > ++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42705) SparkSession.sql doesn't return values from commands.
[ https://issues.apache.org/jira/browse/SPARK-42705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697660#comment-17697660 ] Apache Spark commented on SPARK-42705: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40323 > SparkSession.sql doesn't return values from commands. > - > > Key: SPARK-42705 > URL: https://issues.apache.org/jira/browse/SPARK-42705 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > > {code:python} > >>> spark.sql("show functions").show() > ++ > |function| > ++ > ++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42705) SparkSession.sql doesn't return values from commands.
[ https://issues.apache.org/jira/browse/SPARK-42705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42705: Assignee: (was: Apache Spark) > SparkSession.sql doesn't return values from commands. > - > > Key: SPARK-42705 > URL: https://issues.apache.org/jira/browse/SPARK-42705 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > > {code:python} > >>> spark.sql("show functions").show() > ++ > |function| > ++ > ++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697658#comment-17697658 ] Apache Spark commented on SPARK-41775: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40322 > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects
[ https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42704: Assignee: Apache Spark > SubqueryAlias should propagate metadata columns its child already selects > -- > > Key: SPARK-42704 > URL: https://issues.apache.org/jira/browse/SPARK-42704 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Assignee: Apache Spark >Priority: Major > > The `AddMetadataColumns` analyzer rule intends to make resolve available > metadata columns, even if the plan already contains projections that did not > explicitly mention the metadata column. > The `SubqueryAlias` plan node intentionally does not propagate metadata > columns automatically from a non-leaf/non-subquery child node, because the > following should _not_ work: > > {code:java} > spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} > However, today it is too strict in breaks the metadata chain, in case the > child node's output already includes the metadata column: > > {code:java} > // expected to work (and does) > spark.read.table("t") > .select("a", "b").select("_metadata") > // by extension, should also work (but does not) > spark.read.table("t").select("a", "b", "_metadata").as("s") > .select("a", "b").select("_metadata"){code} > The solution is for `SubqueryAlias` to always propagate metadata columns that > are already in the child's output, thus preserving the `metadataOutput` chain > for that column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects
[ https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697565#comment-17697565 ] Apache Spark commented on SPARK-42704: -- User 'ryan-johnson-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/40321 > SubqueryAlias should propagate metadata columns its child already selects > -- > > Key: SPARK-42704 > URL: https://issues.apache.org/jira/browse/SPARK-42704 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > The `AddMetadataColumns` analyzer rule intends to make resolve available > metadata columns, even if the plan already contains projections that did not > explicitly mention the metadata column. > The `SubqueryAlias` plan node intentionally does not propagate metadata > columns automatically from a non-leaf/non-subquery child node, because the > following should _not_ work: > > {code:java} > spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} > However, today it is too strict in breaks the metadata chain, in case the > child node's output already includes the metadata column: > > {code:java} > // expected to work (and does) > spark.read.table("t") > .select("a", "b").select("_metadata") > // by extension, should also work (but does not) > spark.read.table("t").select("a", "b", "_metadata").as("s") > .select("a", "b").select("_metadata"){code} > The solution is for `SubqueryAlias` to always propagate metadata columns that > are already in the child's output, thus preserving the `metadataOutput` chain > for that column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects
[ https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42704: Assignee: (was: Apache Spark) > SubqueryAlias should propagate metadata columns its child already selects > -- > > Key: SPARK-42704 > URL: https://issues.apache.org/jira/browse/SPARK-42704 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > The `AddMetadataColumns` analyzer rule intends to make resolve available > metadata columns, even if the plan already contains projections that did not > explicitly mention the metadata column. > The `SubqueryAlias` plan node intentionally does not propagate metadata > columns automatically from a non-leaf/non-subquery child node, because the > following should _not_ work: > > {code:java} > spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} > However, today it is too strict in breaks the metadata chain, in case the > child node's output already includes the metadata column: > > {code:java} > // expected to work (and does) > spark.read.table("t") > .select("a", "b").select("_metadata") > // by extension, should also work (but does not) > spark.read.table("t").select("a", "b", "_metadata").as("s") > .select("a", "b").select("_metadata"){code} > The solution is for `SubqueryAlias` to always propagate metadata columns that > are already in the child's output, thus preserving the `metadataOutput` chain > for that column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42692) Implement Dataset.toJson
[ https://issues.apache.org/jira/browse/SPARK-42692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42692: Assignee: Apache Spark > Implement Dataset.toJson > > > Key: SPARK-42692 > URL: https://issues.apache.org/jira/browse/SPARK-42692 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > Implement Dataset.toJSON: > > {code:java} > /** > * Returns the content of the Dataset as a Dataset of JSON strings. > * @since 3.4.0 > */ > def toJSON: Dataset[String]{code} > > Please see if we can implement this using > {{{}project(to_json(struct(*))).as(StringEncoder){}}}. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42692) Implement Dataset.toJson
[ https://issues.apache.org/jira/browse/SPARK-42692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697467#comment-17697467 ] Apache Spark commented on SPARK-42692: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40319 > Implement Dataset.toJson > > > Key: SPARK-42692 > URL: https://issues.apache.org/jira/browse/SPARK-42692 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement Dataset.toJSON: > > {code:java} > /** > * Returns the content of the Dataset as a Dataset of JSON strings. > * @since 3.4.0 > */ > def toJSON: Dataset[String]{code} > > Please see if we can implement this using > {{{}project(to_json(struct(*))).as(StringEncoder){}}}. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42692) Implement Dataset.toJson
[ https://issues.apache.org/jira/browse/SPARK-42692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42692: Assignee: (was: Apache Spark) > Implement Dataset.toJson > > > Key: SPARK-42692 > URL: https://issues.apache.org/jira/browse/SPARK-42692 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Implement Dataset.toJSON: > > {code:java} > /** > * Returns the content of the Dataset as a Dataset of JSON strings. > * @since 3.4.0 > */ > def toJSON: Dataset[String]{code} > > Please see if we can implement this using > {{{}project(to_json(struct(*))).as(StringEncoder){}}}. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script
[ https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697423#comment-17697423 ] Apache Spark commented on SPARK-42656: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40318 > Spark Connect Scala Client Shell Script > --- > > Key: SPARK-42656 > URL: https://issues.apache.org/jira/browse/SPARK-42656 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Adding a shell script to run scala client in a scala REPL to allow users to > connect to spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42700) Add h2 as test dependency of connect-server module
[ https://issues.apache.org/jira/browse/SPARK-42700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697417#comment-17697417 ] Apache Spark commented on SPARK-42700: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40317 > Add h2 as test dependency of connect-server module > -- > > Key: SPARK-42700 > URL: https://issues.apache.org/jira/browse/SPARK-42700 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > run > # mvn clean install -DskipTests -pl connector/connect/server -am > # mvn test -pl connector/connect/server > {code:java} > *** RUN ABORTED *** > java.lang.ClassNotFoundException: org.h2.Driver > at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at org.apache.spark.util.Utils$.classForName(Utils.scala:225) > at > org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite.beforeAll(ProtoToParsedPlanTestSuite.scala:68) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > ... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42700) Add h2 as test dependency of connect-server module
[ https://issues.apache.org/jira/browse/SPARK-42700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42700: Assignee: (was: Apache Spark) > Add h2 as test dependency of connect-server module > -- > > Key: SPARK-42700 > URL: https://issues.apache.org/jira/browse/SPARK-42700 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > run > # mvn clean install -DskipTests -pl connector/connect/server -am > # mvn test -pl connector/connect/server > {code:java} > *** RUN ABORTED *** > java.lang.ClassNotFoundException: org.h2.Driver > at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at org.apache.spark.util.Utils$.classForName(Utils.scala:225) > at > org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite.beforeAll(ProtoToParsedPlanTestSuite.scala:68) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > ... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42700) Add h2 as test dependency of connect-server module
[ https://issues.apache.org/jira/browse/SPARK-42700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42700: Assignee: Apache Spark > Add h2 as test dependency of connect-server module > -- > > Key: SPARK-42700 > URL: https://issues.apache.org/jira/browse/SPARK-42700 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > run > # mvn clean install -DskipTests -pl connector/connect/server -am > # mvn test -pl connector/connect/server > {code:java} > *** RUN ABORTED *** > java.lang.ClassNotFoundException: org.h2.Driver > at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at org.apache.spark.util.Utils$.classForName(Utils.scala:225) > at > org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite.beforeAll(ProtoToParsedPlanTestSuite.scala:68) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > ... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.
[ https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42679: Assignee: Apache Spark > createDataFrame doesn't work with non-nullable schema. > -- > > Key: SPARK-42679 > URL: https://issues.apache.org/jira/browse/SPARK-42679 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > spark.createDataFrame won't work with non-nullable schema as below: > {code:java} > from pyspark.sql.types import * > schema_false = StructType([StructField("id", IntegerType(), False)]) > spark.createDataFrame([[1]], schema=schema_false) > Traceback (most recent call last): > ... > pyspark.errors.exceptions.connect.AnalysisException: > [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's > required to be non-nullable.{code} > whereas it works fine with nullable schema: > {code:java} > schema_true = StructType([StructField("id", IntegerType(), True)]) > spark.createDataFrame([[1]], schema=schema_true) > DataFrame[id: int]{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.
[ https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697412#comment-17697412 ] Apache Spark commented on SPARK-42679: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40316 > createDataFrame doesn't work with non-nullable schema. > -- > > Key: SPARK-42679 > URL: https://issues.apache.org/jira/browse/SPARK-42679 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > spark.createDataFrame won't work with non-nullable schema as below: > {code:java} > from pyspark.sql.types import * > schema_false = StructType([StructField("id", IntegerType(), False)]) > spark.createDataFrame([[1]], schema=schema_false) > Traceback (most recent call last): > ... > pyspark.errors.exceptions.connect.AnalysisException: > [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's > required to be non-nullable.{code} > whereas it works fine with nullable schema: > {code:java} > schema_true = StructType([StructField("id", IntegerType(), True)]) > spark.createDataFrame([[1]], schema=schema_true) > DataFrame[id: int]{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.
[ https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42679: Assignee: (was: Apache Spark) > createDataFrame doesn't work with non-nullable schema. > -- > > Key: SPARK-42679 > URL: https://issues.apache.org/jira/browse/SPARK-42679 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > spark.createDataFrame won't work with non-nullable schema as below: > {code:java} > from pyspark.sql.types import * > schema_false = StructType([StructField("id", IntegerType(), False)]) > spark.createDataFrame([[1]], schema=schema_false) > Traceback (most recent call last): > ... > pyspark.errors.exceptions.connect.AnalysisException: > [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's > required to be non-nullable.{code} > whereas it works fine with nullable schema: > {code:java} > schema_true = StructType([StructField("id", IntegerType(), True)]) > spark.createDataFrame([[1]], schema=schema_true) > DataFrame[id: int]{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42699) SparkConnectServer should make client and AM same exit code
[ https://issues.apache.org/jira/browse/SPARK-42699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697373#comment-17697373 ] Apache Spark commented on SPARK-42699: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/40315 > SparkConnectServer should make client and AM same exit code > --- > > Key: SPARK-42699 > URL: https://issues.apache.org/jira/browse/SPARK-42699 > Project: Spark > Issue Type: Sub-task > Components: Connect, Spark Core >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42699) SparkConnectServer should make client and AM same exit code
[ https://issues.apache.org/jira/browse/SPARK-42699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42699: Assignee: Apache Spark > SparkConnectServer should make client and AM same exit code > --- > > Key: SPARK-42699 > URL: https://issues.apache.org/jira/browse/SPARK-42699 > Project: Spark > Issue Type: Sub-task > Components: Connect, Spark Core >Affects Versions: 3.5.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42699) SparkConnectServer should make client and AM same exit code
[ https://issues.apache.org/jira/browse/SPARK-42699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42699: Assignee: (was: Apache Spark) > SparkConnectServer should make client and AM same exit code > --- > > Key: SPARK-42699 > URL: https://issues.apache.org/jira/browse/SPARK-42699 > Project: Spark > Issue Type: Sub-task > Components: Connect, Spark Core >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42698) Client mode submit task client should keep same exitcode with AM
[ https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42698: Assignee: (was: Apache Spark) > Client mode submit task client should keep same exitcode with AM > > > Key: SPARK-42698 > URL: https://issues.apache.org/jira/browse/SPARK-42698 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > > ``` > try { > app.start(childArgs.toArray, sparkConf) > } catch { > case t: Throwable => > throw findCause(t) > } finally { > if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && > !isThriftServer(args.mainClass)) { > try { > SparkContext.getActive.foreach(_.stop()) > } catch { > case e: Throwable => logError(s"Failed to close SparkContext: $e") > } > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42698) Client mode submit task client should keep same exitcode with AM
[ https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42698: Assignee: Apache Spark > Client mode submit task client should keep same exitcode with AM > > > Key: SPARK-42698 > URL: https://issues.apache.org/jira/browse/SPARK-42698 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > ``` > try { > app.start(childArgs.toArray, sparkConf) > } catch { > case t: Throwable => > throw findCause(t) > } finally { > if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && > !isThriftServer(args.mainClass)) { > try { > SparkContext.getActive.foreach(_.stop()) > } catch { > case e: Throwable => logError(s"Failed to close SparkContext: $e") > } > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42698) Client mode submit task client should keep same exitcode with AM
[ https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697364#comment-17697364 ] Apache Spark commented on SPARK-42698: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/40314 > Client mode submit task client should keep same exitcode with AM > > > Key: SPARK-42698 > URL: https://issues.apache.org/jira/browse/SPARK-42698 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > > ``` > try { > app.start(childArgs.toArray, sparkConf) > } catch { > case t: Throwable => > throw findCause(t) > } finally { > if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && > !isThriftServer(args.mainClass)) { > try { > SparkContext.getActive.foreach(_.stop()) > } catch { > case e: Throwable => logError(s"Failed to close SparkContext: $e") > } > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42698) Client mode submit task client should keep same exitcode with AM
[ https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697365#comment-17697365 ] Apache Spark commented on SPARK-42698: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/40314 > Client mode submit task client should keep same exitcode with AM > > > Key: SPARK-42698 > URL: https://issues.apache.org/jira/browse/SPARK-42698 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > > ``` > try { > app.start(childArgs.toArray, sparkConf) > } catch { > case t: Throwable => > throw findCause(t) > } finally { > if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && > !isThriftServer(args.mainClass)) { > try { > SparkContext.getActive.foreach(_.stop()) > } catch { > case e: Throwable => logError(s"Failed to close SparkContext: $e") > } > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42697) /api/v1/applications return 0 for duration
[ https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697356#comment-17697356 ] Apache Spark commented on SPARK-42697: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/40313 > /api/v1/applications return 0 for duration > -- > > Key: SPARK-42697 > URL: https://issues.apache.org/jira/browse/SPARK-42697 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0 >Reporter: Kent Yao >Priority: Major > > which should be total uptime -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42697) /api/v1/applications return 0 for duration
[ https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42697: Assignee: (was: Apache Spark) > /api/v1/applications return 0 for duration > -- > > Key: SPARK-42697 > URL: https://issues.apache.org/jira/browse/SPARK-42697 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0 >Reporter: Kent Yao >Priority: Major > > which should be total uptime -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42697) /api/v1/applications return 0 for duration
[ https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42697: Assignee: Apache Spark > /api/v1/applications return 0 for duration > -- > > Key: SPARK-42697 > URL: https://issues.apache.org/jira/browse/SPARK-42697 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > which should be total uptime -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42695) Skew join handling in stream side of broadcast hash join
[ https://issues.apache.org/jira/browse/SPARK-42695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42695: Assignee: Apache Spark > Skew join handling in stream side of broadcast hash join > > > Key: SPARK-42695 > URL: https://issues.apache.org/jira/browse/SPARK-42695 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xingchao, Zhang >Assignee: Apache Spark >Priority: Major > Attachments: before-01.png > > > We can extended the current OptimizeSkewedJoin if data skew detected in > stream side of broadcast hash join > > !before-01.png|width=609,height=626! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42695) Skew join handling in stream side of broadcast hash join
[ https://issues.apache.org/jira/browse/SPARK-42695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697306#comment-17697306 ] Apache Spark commented on SPARK-42695: -- User 'xingchaozh' has created a pull request for this issue: https://github.com/apache/spark/pull/40312 > Skew join handling in stream side of broadcast hash join > > > Key: SPARK-42695 > URL: https://issues.apache.org/jira/browse/SPARK-42695 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xingchao, Zhang >Priority: Major > Attachments: before-01.png > > > We can extended the current OptimizeSkewedJoin if data skew detected in > stream side of broadcast hash join > > !before-01.png|width=609,height=626! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42695) Skew join handling in stream side of broadcast hash join
[ https://issues.apache.org/jira/browse/SPARK-42695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42695: Assignee: (was: Apache Spark) > Skew join handling in stream side of broadcast hash join > > > Key: SPARK-42695 > URL: https://issues.apache.org/jira/browse/SPARK-42695 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xingchao, Zhang >Priority: Major > Attachments: before-01.png > > > We can extended the current OptimizeSkewedJoin if data skew detected in > stream side of broadcast hash join > > !before-01.png|width=609,height=626! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42559) Implement DataFrameNaFunctions
[ https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697249#comment-17697249 ] Apache Spark commented on SPARK-42559: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40311 > Implement DataFrameNaFunctions > -- > > Key: SPARK-42559 > URL: https://issues.apache.org/jira/browse/SPARK-42559 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: BingKun Pan >Priority: Major > Fix For: 3.4.1 > > > Implement DataFrameNaFunctions for connect and hook it up to Dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42022) createDataFrame should autogenerate missing column names
[ https://issues.apache.org/jira/browse/SPARK-42022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697188#comment-17697188 ] Apache Spark commented on SPARK-42022: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40310 > createDataFrame should autogenerate missing column names > > > Key: SPARK-42022 > URL: https://issues.apache.org/jira/browse/SPARK-42022 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:233 > (TypesParityTests.test_infer_schema_not_enough_names) > ['col1', '_2'] != ['col1'] > Expected :['col1'] > Actual :['col1', '_2'] > > self = testMethod=test_infer_schema_not_enough_names> > def test_infer_schema_not_enough_names(self): > df = self.spark.createDataFrame([["a", "b"]], ["col1"]) > > self.assertEqual(df.columns, ["col1", "_2"]) > ../test_types.py:236: AssertionError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42022) createDataFrame should autogenerate missing column names
[ https://issues.apache.org/jira/browse/SPARK-42022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42022: Assignee: (was: Apache Spark) > createDataFrame should autogenerate missing column names > > > Key: SPARK-42022 > URL: https://issues.apache.org/jira/browse/SPARK-42022 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:233 > (TypesParityTests.test_infer_schema_not_enough_names) > ['col1', '_2'] != ['col1'] > Expected :['col1'] > Actual :['col1', '_2'] > > self = testMethod=test_infer_schema_not_enough_names> > def test_infer_schema_not_enough_names(self): > df = self.spark.createDataFrame([["a", "b"]], ["col1"]) > > self.assertEqual(df.columns, ["col1", "_2"]) > ../test_types.py:236: AssertionError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42022) createDataFrame should autogenerate missing column names
[ https://issues.apache.org/jira/browse/SPARK-42022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42022: Assignee: Apache Spark > createDataFrame should autogenerate missing column names > > > Key: SPARK-42022 > URL: https://issues.apache.org/jira/browse/SPARK-42022 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:233 > (TypesParityTests.test_infer_schema_not_enough_names) > ['col1', '_2'] != ['col1'] > Expected :['col1'] > Actual :['col1', '_2'] > > self = testMethod=test_infer_schema_not_enough_names> > def test_infer_schema_not_enough_names(self): > df = self.spark.createDataFrame([["a", "b"]], ["col1"]) > > self.assertEqual(df.columns, ["col1", "_2"]) > ../test_types.py:236: AssertionError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42688) Rename Connect proto Request client_id to session_id
[ https://issues.apache.org/jira/browse/SPARK-42688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42688: Assignee: (was: Apache Spark) > Rename Connect proto Request client_id to session_id > > > Key: SPARK-42688 > URL: https://issues.apache.org/jira/browse/SPARK-42688 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42688) Rename Connect proto Request client_id to session_id
[ https://issues.apache.org/jira/browse/SPARK-42688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697163#comment-17697163 ] Apache Spark commented on SPARK-42688: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/40309 > Rename Connect proto Request client_id to session_id > > > Key: SPARK-42688 > URL: https://issues.apache.org/jira/browse/SPARK-42688 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42688) Rename Connect proto Request client_id to session_id
[ https://issues.apache.org/jira/browse/SPARK-42688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42688: Assignee: Apache Spark > Rename Connect proto Request client_id to session_id > > > Key: SPARK-42688 > URL: https://issues.apache.org/jira/browse/SPARK-42688 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script
[ https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697162#comment-17697162 ] Apache Spark commented on SPARK-42656: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40305 > Spark Connect Scala Client Shell Script > --- > > Key: SPARK-42656 > URL: https://issues.apache.org/jira/browse/SPARK-42656 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Adding a shell script to run scala client in a scala REPL to allow users to > connect to spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42151) Align UPDATE assignments with table attributes
[ https://issues.apache.org/jira/browse/SPARK-42151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42151: Assignee: (was: Apache Spark) > Align UPDATE assignments with table attributes > -- > > Key: SPARK-42151 > URL: https://issues.apache.org/jira/browse/SPARK-42151 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > Assignment in UPDATE commands should be aligned with table attributes prior > to rewriting those UPDATE commands. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42151) Align UPDATE assignments with table attributes
[ https://issues.apache.org/jira/browse/SPARK-42151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42151: Assignee: Apache Spark > Align UPDATE assignments with table attributes > -- > > Key: SPARK-42151 > URL: https://issues.apache.org/jira/browse/SPARK-42151 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > Assignment in UPDATE commands should be aligned with table attributes prior > to rewriting those UPDATE commands. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org