[jira] [Commented] (SPARK-26819) ArrayIndexOutOfBoundsException while loading a CSV to a Dataset with dependencies spark-core_2.12 and spark-sql_2.12 (with spark-core_2.11 and spark-sql_2.11 : working
[ https://issues.apache.org/jira/browse/SPARK-26819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759617#comment-16759617 ] Hyukjin Kwon commented on SPARK-26819: -- It works fine in Spark. Can you clarify what {quote} if Spark 2.4.0 is associated to dependencies spark-spark-core_2.12 and spark-sql_2.12, but works fine with spark-core_2.11 and spark-sql_2.11. {quote} menas? > ArrayIndexOutOfBoundsException while loading a CSV to a Dataset with > dependencies spark-core_2.12 and spark-sql_2.12 (with spark-core_2.11 and > spark-sql_2.11 : working fine) > - > > Key: SPARK-26819 > URL: https://issues.apache.org/jira/browse/SPARK-26819 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.4.0 > Environment: Java 8, Windows 7. >Reporter: M. Le Bihan >Priority: Major > Attachments: CompteResultatCSV.java, ComptesResultatsIT.java, > comptes-communes-Entr'Allier.csv > > > A simple CSV reading to a Dataset fails if Spark 2.4.0 is associated to > dependencies spark-spark-core_2.12 and spark-sql_2.12, but works fine with > spark-core_2.11 and spark-sql_2.11. > > With _2.12, I encounter this stacktrace : > > {{java.lang.ArrayIndexOutOfBoundsException: 10582}} > {{ at > com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)}} > {{ at > com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)}} > {{ at > com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)}} > {{ at > com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)}} > {{ at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)}} > {{ at scala.collection.Iterator.foreach(Iterator.scala:937)}} > {{ at scala.collection.Iterator.foreach$(Iterator.scala:937)}} > {{ at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)}} > {{ at scala.collection.IterableLike.foreach(IterableLike.scala:70)}} > {{ at scala.collection.IterableLike.foreach$(IterableLike.scala:69)}} > {{ at scala.collection.AbstractIterable.foreach(Iterable.scala:54)}} > {{ at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)}} > {{ at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)}} > {{ at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)}} > {{ at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)}} > {{ at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)}} > {{ at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)}} > {{ at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)}} > {{ at scala.collection.TraversableLike.map(TraversableLike.scala:233)}} > {{ at scala.collection.TraversableLike.map$(TraversableLike.scala:226)}} > {{ at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:194)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)}} > {{ at > scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)}} > {{ at scala.collection.immutable.List.foreach(List.scala:388)}} > {{ at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)}} > {{ at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)}} > {{ at scala.collection.immutable.List.flatMap(List.scala:351)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:21)}} > {{ at > com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:29)}} > {{ at
[jira] [Assigned] (SPARK-26603) Update minikube backend in K8s integration tests
[ https://issues.apache.org/jira/browse/SPARK-26603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung reassigned SPARK-26603: Assignee: Stavros Kontopoulos > Update minikube backend in K8s integration tests > > > Key: SPARK-26603 > URL: https://issues.apache.org/jira/browse/SPARK-26603 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos >Priority: Major > > Minikube status command has changed > ([https://github.com/kubernetes/minikube/commit/cb3624dd089e7ab0c03fbfb81f20c2bde43a60f3#diff-bd0534bbb0703b4170d467d074373788]) > in the latest releases >0.30. > Old output: > {quote}minikube status > There is a newer version of minikube available (v0.31.0). Download it here: > [https://github.com/kubernetes/minikube/releases/tag/v0.31.0] > To disable this notification, run the following: > minikube config set WantUpdateNotification false > minikube: > cluster: > kubectl: > {quote} > new output: > {quote}minikube status > host: Running > kubelet: Running > apiserver: Running > kubectl: Correctly Configured: pointing to minikube-vm at 172.31.34.77 > {quote} > That means users with latest version of minikube will not be able to run the > integration tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26603) Update minikube backend in K8s integration tests
[ https://issues.apache.org/jira/browse/SPARK-26603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-26603. -- Resolution: Fixed Fix Version/s: 3.0.0 > Update minikube backend in K8s integration tests > > > Key: SPARK-26603 > URL: https://issues.apache.org/jira/browse/SPARK-26603 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos >Priority: Major > Fix For: 3.0.0 > > > Minikube status command has changed > ([https://github.com/kubernetes/minikube/commit/cb3624dd089e7ab0c03fbfb81f20c2bde43a60f3#diff-bd0534bbb0703b4170d467d074373788]) > in the latest releases >0.30. > Old output: > {quote}minikube status > There is a newer version of minikube available (v0.31.0). Download it here: > [https://github.com/kubernetes/minikube/releases/tag/v0.31.0] > To disable this notification, run the following: > minikube config set WantUpdateNotification false > minikube: > cluster: > kubectl: > {quote} > new output: > {quote}minikube status > host: Running > kubelet: Running > apiserver: Running > kubectl: Correctly Configured: pointing to minikube-vm at 172.31.34.77 > {quote} > That means users with latest version of minikube will not be able to run the > integration tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26810) Fixing SPARK-25072 broke existing code and fails to show error message
[ https://issues.apache.org/jira/browse/SPARK-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26810. -- Resolution: Duplicate SPARK-23299 (I think > Fixing SPARK-25072 broke existing code and fails to show error message > -- > > Key: SPARK-26810 > URL: https://issues.apache.org/jira/browse/SPARK-26810 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Arttu Voutilainen >Priority: Minor > > Hey, > We upgraded Spark recently, and > https://issues.apache.org/jira/browse/SPARK-25072 caused our pipeline to fail > after the upgrade. Annoyingly, the error message formatting also threw an > exception itself, thus hiding the message we should have seen. > Repro using gettyimages/docker-spark, on 2.4.0: > {code} > from pyspark.sql import Row > r = Row(['a','b']) > r('1', '2') > {code} > {code} > Traceback (most recent call last): > File "", line 1, in > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1505, in __call__ > "but got %s" % (self, len(self), args)) > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1552, in __repr__ > return "" % ", ".join(self) > TypeError: sequence item 0: expected str instance, list found > {code} > On 2.3.1, and also showing how this was used: > {code} > from pyspark.sql import Row, types as T > r = Row(['a','b']) > df = spark.createDataFrame([Row(col='doesntmatter')]) > rdd = df.rdd.mapPartitions(lambda p: [r('a1','b2')]) > spark.createDataFrame(rdd, T.StructType([T.StructField('a', T.StringType()), > T.StructField('b', T.StringType())])).collect() > {code} > {code} > [Row(a='a1', b='b2'), Row(a='a1', b='b2')] > {code} > While I do think the code we had was quite horrible, it used to work. The > unexpected error came from __repr__ as it assumes that the arguments given to > Row constructor are strings. That sounds like a reasonable assumption, should > the Row constructor validate that it holds true maybe? (I guess that might be > another potentially breaking change though, if someone has as weird code as > this one...) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26810) Fixing SPARK-25072 broke existing code and fails to show error message
[ https://issues.apache.org/jira/browse/SPARK-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759571#comment-16759571 ] Hyukjin Kwon commented on SPARK-26810: -- I'm leaving this as a duplicate of SPARK-23299. Thanks for reporting this with detailed info. > Fixing SPARK-25072 broke existing code and fails to show error message > -- > > Key: SPARK-26810 > URL: https://issues.apache.org/jira/browse/SPARK-26810 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Arttu Voutilainen >Priority: Minor > > Hey, > We upgraded Spark recently, and > https://issues.apache.org/jira/browse/SPARK-25072 caused our pipeline to fail > after the upgrade. Annoyingly, the error message formatting also threw an > exception itself, thus hiding the message we should have seen. > Repro using gettyimages/docker-spark, on 2.4.0: > {code} > from pyspark.sql import Row > r = Row(['a','b']) > r('1', '2') > {code} > {code} > Traceback (most recent call last): > File "", line 1, in > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1505, in __call__ > "but got %s" % (self, len(self), args)) > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1552, in __repr__ > return "" % ", ".join(self) > TypeError: sequence item 0: expected str instance, list found > {code} > On 2.3.1, and also showing how this was used: > {code} > from pyspark.sql import Row, types as T > r = Row(['a','b']) > df = spark.createDataFrame([Row(col='doesntmatter')]) > rdd = df.rdd.mapPartitions(lambda p: [r('a1','b2')]) > spark.createDataFrame(rdd, T.StructType([T.StructField('a', T.StringType()), > T.StructField('b', T.StringType())])).collect() > {code} > {code} > [Row(a='a1', b='b2'), Row(a='a1', b='b2')] > {code} > While I do think the code we had was quite horrible, it used to work. The > unexpected error came from __repr__ as it assumes that the arguments given to > Row constructor are strings. That sounds like a reasonable assumption, should > the Row constructor validate that it holds true maybe? (I guess that might be > another potentially breaking change though, if someone has as weird code as > this one...) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-26810) Fixing SPARK-25072 broke existing code and fails to show error message
[ https://issues.apache.org/jira/browse/SPARK-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26810: - Comment: was deleted (was: SPARK-23299 (I think) > Fixing SPARK-25072 broke existing code and fails to show error message > -- > > Key: SPARK-26810 > URL: https://issues.apache.org/jira/browse/SPARK-26810 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Arttu Voutilainen >Priority: Minor > > Hey, > We upgraded Spark recently, and > https://issues.apache.org/jira/browse/SPARK-25072 caused our pipeline to fail > after the upgrade. Annoyingly, the error message formatting also threw an > exception itself, thus hiding the message we should have seen. > Repro using gettyimages/docker-spark, on 2.4.0: > {code} > from pyspark.sql import Row > r = Row(['a','b']) > r('1', '2') > {code} > {code} > Traceback (most recent call last): > File "", line 1, in > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1505, in __call__ > "but got %s" % (self, len(self), args)) > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1552, in __repr__ > return "" % ", ".join(self) > TypeError: sequence item 0: expected str instance, list found > {code} > On 2.3.1, and also showing how this was used: > {code} > from pyspark.sql import Row, types as T > r = Row(['a','b']) > df = spark.createDataFrame([Row(col='doesntmatter')]) > rdd = df.rdd.mapPartitions(lambda p: [r('a1','b2')]) > spark.createDataFrame(rdd, T.StructType([T.StructField('a', T.StringType()), > T.StructField('b', T.StringType())])).collect() > {code} > {code} > [Row(a='a1', b='b2'), Row(a='a1', b='b2')] > {code} > While I do think the code we had was quite horrible, it used to work. The > unexpected error came from __repr__ as it assumes that the arguments given to > Row constructor are strings. That sounds like a reasonable assumption, should > the Row constructor validate that it holds true maybe? (I guess that might be > another potentially breaking change though, if someone has as weird code as > this one...) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26810) Fixing SPARK-25072 broke existing code and fails to show error message
[ https://issues.apache.org/jira/browse/SPARK-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759549#comment-16759549 ] Arttu Voutilainen commented on SPARK-26810: --- Yup, no worries. Now that you understood the case, I'll leave it up to you to either close this as duplicate of SPARK-23299 (I think that's the only thing here that should be fixed some day), or if you want to keep it open to discuss the SPARK-25072 fix or something. > Fixing SPARK-25072 broke existing code and fails to show error message > -- > > Key: SPARK-26810 > URL: https://issues.apache.org/jira/browse/SPARK-26810 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Arttu Voutilainen >Priority: Minor > > Hey, > We upgraded Spark recently, and > https://issues.apache.org/jira/browse/SPARK-25072 caused our pipeline to fail > after the upgrade. Annoyingly, the error message formatting also threw an > exception itself, thus hiding the message we should have seen. > Repro using gettyimages/docker-spark, on 2.4.0: > {code} > from pyspark.sql import Row > r = Row(['a','b']) > r('1', '2') > {code} > {code} > Traceback (most recent call last): > File "", line 1, in > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1505, in __call__ > "but got %s" % (self, len(self), args)) > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1552, in __repr__ > return "" % ", ".join(self) > TypeError: sequence item 0: expected str instance, list found > {code} > On 2.3.1, and also showing how this was used: > {code} > from pyspark.sql import Row, types as T > r = Row(['a','b']) > df = spark.createDataFrame([Row(col='doesntmatter')]) > rdd = df.rdd.mapPartitions(lambda p: [r('a1','b2')]) > spark.createDataFrame(rdd, T.StructType([T.StructField('a', T.StringType()), > T.StructField('b', T.StringType())])).collect() > {code} > {code} > [Row(a='a1', b='b2'), Row(a='a1', b='b2')] > {code} > While I do think the code we had was quite horrible, it used to work. The > unexpected error came from __repr__ as it assumes that the arguments given to > Row constructor are strings. That sounds like a reasonable assumption, should > the Row constructor validate that it holds true maybe? (I guess that might be > another potentially breaking change though, if someone has as weird code as > this one...) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26572) Join on distinct column with monotonically_increasing_id produces wrong output
[ https://issues.apache.org/jira/browse/SPARK-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26572: Assignee: Apache Spark > Join on distinct column with monotonically_increasing_id produces wrong output > -- > > Key: SPARK-26572 > URL: https://issues.apache.org/jira/browse/SPARK-26572 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.4.0 > Environment: Running on Ubuntu 18.04LTS and Intellij 2018.2.5 >Reporter: Sören Reichardt >Assignee: Apache Spark >Priority: Major > > When joining a table with projected monotonically_increasing_id column after > calling distinct with another table the operators do not get executed in the > right order. > Here is a minimal example: > {code:java} > import org.apache.spark.sql.{DataFrame, SparkSession, functions} > object JoinBug extends App { > // Spark session setup > val session = SparkSession.builder().master("local[*]").getOrCreate() > import session.sqlContext.implicits._ > session.sparkContext.setLogLevel("error") > // Bug in Spark: "monotonically_increasing_id" is pushed down when it > shouldn't be. Push down only happens when the > // DF containing the "monotonically_increasing_id" expression is on the > left side of the join. > val baseTable = Seq((1), (1)).toDF("idx") > val distinctWithId = baseTable.distinct.withColumn("id", > functions.monotonically_increasing_id()) > val monotonicallyOnRight: DataFrame = baseTable.join(distinctWithId, "idx") > val monotonicallyOnLeft: DataFrame = distinctWithId.join(baseTable, "idx") > monotonicallyOnLeft.show // Wrong > monotonicallyOnRight.show // Ok in Spark 2.2.2 - also wrong in Spark 2.4.0 > } > {code} > It produces the following output: > {code:java} > Wrong: > +---++ > |idx| id | > +---++ > | 1|369367187456 | > | 1|369367187457 | > +---++ > Right: > +---++ > |idx| id | > +---++ > | 1|369367187456 | > | 1|369367187456 | > +---++ > {code} > We assume that the join operator triggers a pushdown of expressions > (monotonically_increasing_id in this case) which gets pushed down to be > executed before distinct. This produces non-distinct rows with unique id's. > However it seems like this behavior only appears if the table with the > projected expression is on the left side of the join in Spark 2.2.2 (for > version 2.4.0 it fails on both joins). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26572) Join on distinct column with monotonically_increasing_id produces wrong output
[ https://issues.apache.org/jira/browse/SPARK-26572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26572: Assignee: (was: Apache Spark) > Join on distinct column with monotonically_increasing_id produces wrong output > -- > > Key: SPARK-26572 > URL: https://issues.apache.org/jira/browse/SPARK-26572 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.4.0 > Environment: Running on Ubuntu 18.04LTS and Intellij 2018.2.5 >Reporter: Sören Reichardt >Priority: Major > > When joining a table with projected monotonically_increasing_id column after > calling distinct with another table the operators do not get executed in the > right order. > Here is a minimal example: > {code:java} > import org.apache.spark.sql.{DataFrame, SparkSession, functions} > object JoinBug extends App { > // Spark session setup > val session = SparkSession.builder().master("local[*]").getOrCreate() > import session.sqlContext.implicits._ > session.sparkContext.setLogLevel("error") > // Bug in Spark: "monotonically_increasing_id" is pushed down when it > shouldn't be. Push down only happens when the > // DF containing the "monotonically_increasing_id" expression is on the > left side of the join. > val baseTable = Seq((1), (1)).toDF("idx") > val distinctWithId = baseTable.distinct.withColumn("id", > functions.monotonically_increasing_id()) > val monotonicallyOnRight: DataFrame = baseTable.join(distinctWithId, "idx") > val monotonicallyOnLeft: DataFrame = distinctWithId.join(baseTable, "idx") > monotonicallyOnLeft.show // Wrong > monotonicallyOnRight.show // Ok in Spark 2.2.2 - also wrong in Spark 2.4.0 > } > {code} > It produces the following output: > {code:java} > Wrong: > +---++ > |idx| id | > +---++ > | 1|369367187456 | > | 1|369367187457 | > +---++ > Right: > +---++ > |idx| id | > +---++ > | 1|369367187456 | > | 1|369367187456 | > +---++ > {code} > We assume that the join operator triggers a pushdown of expressions > (monotonically_increasing_id in this case) which gets pushed down to be > executed before distinct. This produces non-distinct rows with unique id's. > However it seems like this behavior only appears if the table with the > projected expression is on the left side of the join in Spark 2.2.2 (for > version 2.4.0 it fails on both joins). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26810) Fixing SPARK-25072 broke existing code and fails to show error message
[ https://issues.apache.org/jira/browse/SPARK-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759468#comment-16759468 ] Hyukjin Kwon commented on SPARK-26810: -- Ah, gotya. Yea, it looks the cuase indeed. Sorry thst i rushed to read. BTW, I think we should better clearly define what to support and unsupport. Given my experience so far, and due to the nature of Python, there are many holes.. it would be nicer if we can whitelist what we support(what we documented). > Fixing SPARK-25072 broke existing code and fails to show error message > -- > > Key: SPARK-26810 > URL: https://issues.apache.org/jira/browse/SPARK-26810 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Arttu Voutilainen >Priority: Minor > > Hey, > We upgraded Spark recently, and > https://issues.apache.org/jira/browse/SPARK-25072 caused our pipeline to fail > after the upgrade. Annoyingly, the error message formatting also threw an > exception itself, thus hiding the message we should have seen. > Repro using gettyimages/docker-spark, on 2.4.0: > {code} > from pyspark.sql import Row > r = Row(['a','b']) > r('1', '2') > {code} > {code} > Traceback (most recent call last): > File "", line 1, in > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1505, in __call__ > "but got %s" % (self, len(self), args)) > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1552, in __repr__ > return "" % ", ".join(self) > TypeError: sequence item 0: expected str instance, list found > {code} > On 2.3.1, and also showing how this was used: > {code} > from pyspark.sql import Row, types as T > r = Row(['a','b']) > df = spark.createDataFrame([Row(col='doesntmatter')]) > rdd = df.rdd.mapPartitions(lambda p: [r('a1','b2')]) > spark.createDataFrame(rdd, T.StructType([T.StructField('a', T.StringType()), > T.StructField('b', T.StringType())])).collect() > {code} > {code} > [Row(a='a1', b='b2'), Row(a='a1', b='b2')] > {code} > While I do think the code we had was quite horrible, it used to work. The > unexpected error came from __repr__ as it assumes that the arguments given to > Row constructor are strings. That sounds like a reasonable assumption, should > the Row constructor validate that it holds true maybe? (I guess that might be > another potentially breaking change though, if someone has as weird code as > this one...) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26822) Upgrade the deprecated module 'optparse'
Neil Chien created SPARK-26822: -- Summary: Upgrade the deprecated module 'optparse' Key: SPARK-26822 URL: https://issues.apache.org/jira/browse/SPARK-26822 Project: Spark Issue Type: Task Components: Tests Affects Versions: 2.4.0 Reporter: Neil Chien Follow the [official document|https://docs.python.org/2/library/argparse.html#upgrading-optparse-code] to upgrade the deprecated module 'optparse' to 'argparse'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26822) Upgrade the deprecated module 'optparse'
[ https://issues.apache.org/jira/browse/SPARK-26822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26822: Assignee: Apache Spark > Upgrade the deprecated module 'optparse' > > > Key: SPARK-26822 > URL: https://issues.apache.org/jira/browse/SPARK-26822 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 2.4.0 >Reporter: Neil Chien >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, test > > Follow the [official > document|https://docs.python.org/2/library/argparse.html#upgrading-optparse-code] > to upgrade the deprecated module 'optparse' to 'argparse'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26822) Upgrade the deprecated module 'optparse'
[ https://issues.apache.org/jira/browse/SPARK-26822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26822: Assignee: (was: Apache Spark) > Upgrade the deprecated module 'optparse' > > > Key: SPARK-26822 > URL: https://issues.apache.org/jira/browse/SPARK-26822 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 2.4.0 >Reporter: Neil Chien >Priority: Minor > Labels: pull-request-available, test > > Follow the [official > document|https://docs.python.org/2/library/argparse.html#upgrading-optparse-code] > to upgrade the deprecated module 'optparse' to 'argparse'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26822) Upgrade the deprecated module 'optparse'
[ https://issues.apache.org/jira/browse/SPARK-26822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759441#comment-16759441 ] Apache Spark commented on SPARK-26822: -- User 'cchung100m' has created a pull request for this issue: https://github.com/apache/spark/pull/23730 > Upgrade the deprecated module 'optparse' > > > Key: SPARK-26822 > URL: https://issues.apache.org/jira/browse/SPARK-26822 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 2.4.0 >Reporter: Neil Chien >Priority: Minor > Labels: pull-request-available, test > > Follow the [official > document|https://docs.python.org/2/library/argparse.html#upgrading-optparse-code] > to upgrade the deprecated module 'optparse' to 'argparse'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26751) HiveSessionImpl might have memory leak since Operation do not close properly
[ https://issues.apache.org/jira/browse/SPARK-26751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-26751: -- Priority: Minor (was: Major) > HiveSessionImpl might have memory leak since Operation do not close properly > > > Key: SPARK-26751 > URL: https://issues.apache.org/jira/browse/SPARK-26751 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: zhoukang >Assignee: zhoukang >Priority: Minor > Fix For: 2.3.4, 2.4.1, 3.0.0 > > Attachments: 26751.png > > > When we run in background and we get exception which is not HiveSQLException, > we may encounter memory leak since handleToOperation will not removed > correctly. > The reason is below: > 1. when calling operation.run we throw an exception which is not > HiveSQLException > 2. then opHandleSet will not add the opHandle, and > operationManager.closeOperation(opHandle); will not be called > {code:java} > private OperationHandle executeStatementInternal(String statement, > Map confOverlay, boolean runAsync) throws HiveSQLException { > this.acquire(true); > OperationManager operationManager = this.getOperationManager(); > ExecuteStatementOperation operation = > operationManager.newExecuteStatementOperation(this.getSession(), statement, > confOverlay, runAsync); > OperationHandle opHandle = operation.getHandle(); > OperationHandle e; > try { > operation.run(); > this.opHandleSet.add(opHandle); > e = opHandle; > } catch (HiveSQLException var11) { > operationManager.closeOperation(opHandle); > throw var11; > } finally { > this.release(true); > } > return e; > } > try { > // This submit blocks if no background threads are available to run > this operation > val backgroundHandle = > > parentSession.getSessionManager().submitBackgroundOperation(backgroundOperation) > setBackgroundHandle(backgroundHandle) > } catch { > case rejected: RejectedExecutionException => > setState(OperationState.ERROR) > throw new HiveSQLException("The background threadpool cannot > accept" + > " new task for execution, please retry the operation", rejected) > case NonFatal(e) => > logError(s"Error executing query in background", e) > setState(OperationState.ERROR) > throw e > } > } > {code} > 3. when we close the session we will also call > operationManager.closeOperation(opHandle),since we did not add this opHandle > into the opHandleSet. > {code} > public void close() throws HiveSQLException { > try { > this.acquire(true); > Iterator ioe = this.opHandleSet.iterator(); > while(ioe.hasNext()) { > OperationHandle opHandle = (OperationHandle)ioe.next(); > this.operationManager.closeOperation(opHandle); > } > this.opHandleSet.clear(); > this.cleanupSessionLogDir(); > this.cleanupPipeoutFile(); > HiveHistory ioe1 = this.sessionState.getHiveHistory(); > if(null != ioe1) { > ioe1.closeStream(); > } > try { > this.sessionState.close(); > } finally { > this.sessionState = null; > } > } catch (IOException var17) { > throw new HiveSQLException("Failure to close", var17); > } finally { > if(this.sessionState != null) { > try { > this.sessionState.close(); > } catch (Throwable var15) { > LOG.warn("Error closing session", var15); > } > this.sessionState = null; > } > this.release(true); > } > } > {code} > 4. however, the opHandle will added into handleToOperation for each statement > {code} > val handleToOperation = ReflectionUtils > .getSuperField[JMap[OperationHandle, Operation]](this, > "handleToOperation") > val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]() > val sessionToContexts = new ConcurrentHashMap[SessionHandle, SQLContext]() > override def newExecuteStatementOperation( > parentSession: HiveSession, > statement: String, > confOverlay: JMap[String, String], > async: Boolean): ExecuteStatementOperation = synchronized { > val sqlContext = sessionToContexts.get(parentSession.getSessionHandle) > require(sqlContext != null, s"Session handle: > ${parentSession.getSessionHandle} has not been" + >
[jira] [Assigned] (SPARK-26751) HiveSessionImpl might have memory leak since Operation do not close properly
[ https://issues.apache.org/jira/browse/SPARK-26751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-26751: - Assignee: zhoukang > HiveSessionImpl might have memory leak since Operation do not close properly > > > Key: SPARK-26751 > URL: https://issues.apache.org/jira/browse/SPARK-26751 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: 26751.png > > > When we run in background and we get exception which is not HiveSQLException, > we may encounter memory leak since handleToOperation will not removed > correctly. > The reason is below: > 1. when calling operation.run we throw an exception which is not > HiveSQLException > 2. then opHandleSet will not add the opHandle, and > operationManager.closeOperation(opHandle); will not be called > {code:java} > private OperationHandle executeStatementInternal(String statement, > Map confOverlay, boolean runAsync) throws HiveSQLException { > this.acquire(true); > OperationManager operationManager = this.getOperationManager(); > ExecuteStatementOperation operation = > operationManager.newExecuteStatementOperation(this.getSession(), statement, > confOverlay, runAsync); > OperationHandle opHandle = operation.getHandle(); > OperationHandle e; > try { > operation.run(); > this.opHandleSet.add(opHandle); > e = opHandle; > } catch (HiveSQLException var11) { > operationManager.closeOperation(opHandle); > throw var11; > } finally { > this.release(true); > } > return e; > } > try { > // This submit blocks if no background threads are available to run > this operation > val backgroundHandle = > > parentSession.getSessionManager().submitBackgroundOperation(backgroundOperation) > setBackgroundHandle(backgroundHandle) > } catch { > case rejected: RejectedExecutionException => > setState(OperationState.ERROR) > throw new HiveSQLException("The background threadpool cannot > accept" + > " new task for execution, please retry the operation", rejected) > case NonFatal(e) => > logError(s"Error executing query in background", e) > setState(OperationState.ERROR) > throw e > } > } > {code} > 3. when we close the session we will also call > operationManager.closeOperation(opHandle),since we did not add this opHandle > into the opHandleSet. > {code} > public void close() throws HiveSQLException { > try { > this.acquire(true); > Iterator ioe = this.opHandleSet.iterator(); > while(ioe.hasNext()) { > OperationHandle opHandle = (OperationHandle)ioe.next(); > this.operationManager.closeOperation(opHandle); > } > this.opHandleSet.clear(); > this.cleanupSessionLogDir(); > this.cleanupPipeoutFile(); > HiveHistory ioe1 = this.sessionState.getHiveHistory(); > if(null != ioe1) { > ioe1.closeStream(); > } > try { > this.sessionState.close(); > } finally { > this.sessionState = null; > } > } catch (IOException var17) { > throw new HiveSQLException("Failure to close", var17); > } finally { > if(this.sessionState != null) { > try { > this.sessionState.close(); > } catch (Throwable var15) { > LOG.warn("Error closing session", var15); > } > this.sessionState = null; > } > this.release(true); > } > } > {code} > 4. however, the opHandle will added into handleToOperation for each statement > {code} > val handleToOperation = ReflectionUtils > .getSuperField[JMap[OperationHandle, Operation]](this, > "handleToOperation") > val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]() > val sessionToContexts = new ConcurrentHashMap[SessionHandle, SQLContext]() > override def newExecuteStatementOperation( > parentSession: HiveSession, > statement: String, > confOverlay: JMap[String, String], > async: Boolean): ExecuteStatementOperation = synchronized { > val sqlContext = sessionToContexts.get(parentSession.getSessionHandle) > require(sqlContext != null, s"Session handle: > ${parentSession.getSessionHandle} has not been" + > s" initialized or had already closed.") > val
[jira] [Resolved] (SPARK-26751) HiveSessionImpl might have memory leak since Operation do not close properly
[ https://issues.apache.org/jira/browse/SPARK-26751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-26751. --- Resolution: Fixed Fix Version/s: 2.3.4 2.4.1 3.0.0 Issue resolved by pull request 23673 [https://github.com/apache/spark/pull/23673] > HiveSessionImpl might have memory leak since Operation do not close properly > > > Key: SPARK-26751 > URL: https://issues.apache.org/jira/browse/SPARK-26751 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Fix For: 3.0.0, 2.4.1, 2.3.4 > > Attachments: 26751.png > > > When we run in background and we get exception which is not HiveSQLException, > we may encounter memory leak since handleToOperation will not removed > correctly. > The reason is below: > 1. when calling operation.run we throw an exception which is not > HiveSQLException > 2. then opHandleSet will not add the opHandle, and > operationManager.closeOperation(opHandle); will not be called > {code:java} > private OperationHandle executeStatementInternal(String statement, > Map confOverlay, boolean runAsync) throws HiveSQLException { > this.acquire(true); > OperationManager operationManager = this.getOperationManager(); > ExecuteStatementOperation operation = > operationManager.newExecuteStatementOperation(this.getSession(), statement, > confOverlay, runAsync); > OperationHandle opHandle = operation.getHandle(); > OperationHandle e; > try { > operation.run(); > this.opHandleSet.add(opHandle); > e = opHandle; > } catch (HiveSQLException var11) { > operationManager.closeOperation(opHandle); > throw var11; > } finally { > this.release(true); > } > return e; > } > try { > // This submit blocks if no background threads are available to run > this operation > val backgroundHandle = > > parentSession.getSessionManager().submitBackgroundOperation(backgroundOperation) > setBackgroundHandle(backgroundHandle) > } catch { > case rejected: RejectedExecutionException => > setState(OperationState.ERROR) > throw new HiveSQLException("The background threadpool cannot > accept" + > " new task for execution, please retry the operation", rejected) > case NonFatal(e) => > logError(s"Error executing query in background", e) > setState(OperationState.ERROR) > throw e > } > } > {code} > 3. when we close the session we will also call > operationManager.closeOperation(opHandle),since we did not add this opHandle > into the opHandleSet. > {code} > public void close() throws HiveSQLException { > try { > this.acquire(true); > Iterator ioe = this.opHandleSet.iterator(); > while(ioe.hasNext()) { > OperationHandle opHandle = (OperationHandle)ioe.next(); > this.operationManager.closeOperation(opHandle); > } > this.opHandleSet.clear(); > this.cleanupSessionLogDir(); > this.cleanupPipeoutFile(); > HiveHistory ioe1 = this.sessionState.getHiveHistory(); > if(null != ioe1) { > ioe1.closeStream(); > } > try { > this.sessionState.close(); > } finally { > this.sessionState = null; > } > } catch (IOException var17) { > throw new HiveSQLException("Failure to close", var17); > } finally { > if(this.sessionState != null) { > try { > this.sessionState.close(); > } catch (Throwable var15) { > LOG.warn("Error closing session", var15); > } > this.sessionState = null; > } > this.release(true); > } > } > {code} > 4. however, the opHandle will added into handleToOperation for each statement > {code} > val handleToOperation = ReflectionUtils > .getSuperField[JMap[OperationHandle, Operation]](this, > "handleToOperation") > val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]() > val sessionToContexts = new ConcurrentHashMap[SessionHandle, SQLContext]() > override def newExecuteStatementOperation( > parentSession: HiveSession, > statement: String, > confOverlay: JMap[String, String], > async: Boolean): ExecuteStatementOperation = synchronized { > val sqlContext =
[jira] [Commented] (SPARK-26810) Fixing SPARK-25072 broke existing code and fails to show error message
[ https://issues.apache.org/jira/browse/SPARK-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759423#comment-16759423 ] Arttu Voutilainen commented on SPARK-26810: --- [~hyukjin.kwon] thanks for checking this! You don't see the error message of SPARK-25072 exactly because of that other issue - while formatting the error message it throws because of SPARK-23299. I have no clue why someone had written the Row(['a', 'b']) originally, I don't think it is documented. Still, it used to work before the SPARK-25072 fix, and SPARK-23299 means it wasn't easy to understand why (as it hides the real error message). > Fixing SPARK-25072 broke existing code and fails to show error message > -- > > Key: SPARK-26810 > URL: https://issues.apache.org/jira/browse/SPARK-26810 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Arttu Voutilainen >Priority: Minor > > Hey, > We upgraded Spark recently, and > https://issues.apache.org/jira/browse/SPARK-25072 caused our pipeline to fail > after the upgrade. Annoyingly, the error message formatting also threw an > exception itself, thus hiding the message we should have seen. > Repro using gettyimages/docker-spark, on 2.4.0: > {code} > from pyspark.sql import Row > r = Row(['a','b']) > r('1', '2') > {code} > {code} > Traceback (most recent call last): > File "", line 1, in > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1505, in __call__ > "but got %s" % (self, len(self), args)) > File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1552, in __repr__ > return "" % ", ".join(self) > TypeError: sequence item 0: expected str instance, list found > {code} > On 2.3.1, and also showing how this was used: > {code} > from pyspark.sql import Row, types as T > r = Row(['a','b']) > df = spark.createDataFrame([Row(col='doesntmatter')]) > rdd = df.rdd.mapPartitions(lambda p: [r('a1','b2')]) > spark.createDataFrame(rdd, T.StructType([T.StructField('a', T.StringType()), > T.StructField('b', T.StringType())])).collect() > {code} > {code} > [Row(a='a1', b='b2'), Row(a='a1', b='b2')] > {code} > While I do think the code we had was quite horrible, it used to work. The > unexpected error came from __repr__ as it assumes that the arguments given to > Row constructor are strings. That sounds like a reasonable assumption, should > the Row constructor validate that it holds true maybe? (I guess that might be > another potentially breaking change though, if someone has as weird code as > this one...) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26818) Make MLEvents JSON ser/de safe
[ https://issues.apache.org/jira/browse/SPARK-26818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26818. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23728 [https://github.com/apache/spark/pull/23728] > Make MLEvents JSON ser/de safe > -- > > Key: SPARK-26818 > URL: https://issues.apache.org/jira/browse/SPARK-26818 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > Looks ML events are not JSON serializable. We can make it serialisable like: > {code} > @DeveloperApi > case class SparkListenerSQLExecutionEnd(executionId: Long, time: Long) > extends SparkListenerEvent { > // The name of the execution, e.g. `df.collect` will trigger a SQL > execution with name "collect". > @JsonIgnore private[sql] var executionName: Option[String] = None > // The following 3 fields are only accessed when `executionName` is defined. > // The duration of the SQL execution, in nanoseconds. > @JsonIgnore private[sql] var duration: Long = 0L > // The `QueryExecution` instance that represents the SQL execution > @JsonIgnore private[sql] var qe: QueryExecution = null > // The exception object that caused this execution to fail. None if the > execution doesn't fail. > @JsonIgnore private[sql] var executionFailure: Option[Exception] = None > } > {code}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759381#comment-16759381 ] Sujith commented on SPARK-26821: As per the initial analysis, this phenomenon is happening because the actual char data type length is 5 where as we are trying to insert a data with length 2, since its a char data type the system will pad the remaining part of the array block with 'space'. now when we try to apply a filter, the system will try to compare the predicate value with the actual table data which contains the space char like 'ds' == 'ds ' which leads to wrong result. I am trying to analyze more on this issue please let me know for any suggestions or guidance. thanks > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382 ] Sujith commented on SPARK-26821: cc [~dongjoon] > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382 ] Sujith edited comment on SPARK-26821 at 2/3/19 11:44 AM: - cc [~dongjoon] [~vinodkc] was (Author: s71955): cc [~dongjoon] > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26821: --- Description: creates a table with a char type field, While inserting data to char data type column, if the data string length is less than the specified datatype length, spark2x will not process filter query properly leading to incorrect result . 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name char(5)); +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.894 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj values(232,'ds'); +--+-+ |Result| +--+-+ +--+-+ No rows selected (1.815 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; +--+--++-- |id|name| +--+--++-- +--+--++-- The above query will not give any result. was: creates a table with a char type field, While inserting data to char data type column, if the data string length is less than the specified datatype length, spark2x will not process filter query properly leading to incorrect result . 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name char(5)); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.894 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj values(232,'ds'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.815 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; +-+---+--+ | id | name | +-+---+--+ +-+---+--+ > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26821) filters not working with char datatype when querying against hive table
Sujith created SPARK-26821: -- Summary: filters not working with char datatype when querying against hive table Key: SPARK-26821 URL: https://issues.apache.org/jira/browse/SPARK-26821 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Sujith creates a table with a char type field, While inserting data to char data type column, if the data string length is less than the specified datatype length, spark2x will not process filter query properly leading to incorrect result . 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name char(5)); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.894 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj values(232,'ds'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.815 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; +-+---+--+ | id | name | +-+---+--+ +-+---+--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26820) Issue Error/Warning when Hint is not applicable
Xiao Li created SPARK-26820: --- Summary: Issue Error/Warning when Hint is not applicable Key: SPARK-26820 URL: https://issues.apache.org/jira/browse/SPARK-26820 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Xiao Li We should issue an error or a warning when the HINT is not applicable. This should be configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org