[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r932196681 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala: ## @@ -421,7 +421,7 @@ private[columnar] case object DictionaryEncoding extends CompressionScheme { override def compress(from: ByteBuffer, to: ByteBuffer): ByteBuffer = { if (overflow) { Review Comment: @MaxGekk should I remove the assert and use the new method name with error class instead of useDictionaryEncodingWhenDictionaryOverflowError() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r928926262 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala: ## @@ -421,7 +421,7 @@ private[columnar] case object DictionaryEncoding extends CompressionScheme { override def compress(from: ByteBuffer, to: ByteBuffer): ByteBuffer = { if (overflow) { Review Comment: @MaxGekk on replacing if(overflow) with asser , pyspark test for CrossValidatorTests is failing for this assertion. File "/__w/spark/spark/python/pyspark/ml/tests/test_tuning.py", line 141, in test_copy cvModel = cv.fit(dataset) py4j.protocol.Py4JJavaError: An error occurred while calling o48.evaluate. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 12) (localhost executor driver): java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) at org.apache.spark.sql.execution.columnar.compression.DictionaryEncoding$Encoder.compress(compressionSchemes.scala:425) Please advice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r922831090 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala: ## @@ -421,7 +421,7 @@ private[columnar] case object DictionaryEncoding extends CompressionScheme { override def compress(from: ByteBuffer, to: ByteBuffer): ByteBuffer = { if (overflow) { Review Comment: @MaxGekk I used the DictionaryEncodingSuite and found that in case of big enough Dictionary the users face java.lang.IllegalArgumentException from ByteBuffer.allocate in build method in CompressibleColumnBuilder as a result of which they never reach encoder.compress which throws useDictionaryEncodingWhenDictionaryOverflowError Please advice next steps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r917225777 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -878,9 +878,10 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { new Exception(s"Unsupported type: ${dataType.catalogString}") } - def useDictionaryEncodingWhenDictionaryOverflowError(): Throwable = { -new IllegalStateException( Review Comment: @HyukjinKwon This appears to be an internal exception not exposed to users -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r915497502 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -878,9 +878,10 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { new Exception(s"Unsupported type: ${dataType.catalogString}") } - def useDictionaryEncodingWhenDictionaryOverflowError(): Throwable = { -new IllegalStateException( Review Comment: This appears to be an internal exception not exposed to users -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r915497502 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -878,9 +878,10 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { new Exception(s"Unsupported type: ${dataType.catalogString}") } - def useDictionaryEncodingWhenDictionaryOverflowError(): Throwable = { -new IllegalStateException( Review Comment: This appears to be an internal exception not exposed to users -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org