[GitHub] [spark] LuciferYang commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

GitBox Tue, 07 Jun 2022 21:44:30 -0700


LuciferYang commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1149455635


   I think this pr should be backport to previous Spark version, because  when 
run `SPARK-39393: Do not push down predicate filters for repeated primitive 
fields` without this pr, I found the error logs as follows:
   
   ```
   Last failure message: There are 1 possibly leaked file streams..
        at 
org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:185)
        at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:192)
        at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:402)
        at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:401)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
        at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:312)
        at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:311)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
        at 
org.apache.spark.sql.test.SharedSparkSessionBase.afterEach(SharedSparkSession.scala:164)
        at 
org.apache.spark.sql.test.SharedSparkSessionBase.afterEach$(SharedSparkSession.scala:158)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.afterEach(ParquetFilterSuite.scala:100)
        at 
org.scalatest.BeforeAndAfterEach.$anonfun$runTest$1(BeforeAndAfterEach.scala:247)
        at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377)
        at 
org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373)
        at org.scalatest.FailedStatus$.whenCompleted(Status.scala:505)
        at org.scalatest.Status.withAfterEffect(Status.scala:373)
        at org.scalatest.Status.withAfterEffect$(Status.scala:371)
        at org.scalatest.FailedStatus$.withAfterEffect(Status.scala:477)
        at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:246)
        at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
        at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
        at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
        at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
        at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
        at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
        at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
        at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
        at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
        at org.scalatest.Suite.run(Suite.scala:1112)
        at org.scalatest.Suite.run$(Suite.scala:1094)
        at 
org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
        at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
        at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
        at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
        at 
org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
        at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:64)
        at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
        at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
        at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
        at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:64)
        at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
        at 
org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
        at 
org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
        at 
org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
        at 
org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
        at 
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
        at 
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
        at org.scalatest.tools.Runner$.run(Runner.scala:798)
        at org.scalatest.tools.Runner.run(Runner.scala)
        at 
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
        at 
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)
   Caused by: java.lang.IllegalStateException: There are 1 possibly leaked file 
streams.
        at 
org.apache.spark.DebugFilesystem$.assertNoOpenStreams(DebugFilesystem.scala:54)
        at 
org.apache.spark.sql.test.SharedSparkSessionBase.$anonfun$afterEach$1(SharedSparkSession.scala:165)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:150)
        at 
org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:162)
        ... 54 more
   ```
   
   It seems that this issue is related to parquet-mr:
   
   
https://github.com/LuciferYang/parquet-mr/blob/a2da156b251d13bce1fa81eb95b555da04880bc1/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L790
   
   the line 790 may throw `IOE` but not close `f`, this issue needs to be fixed 
in `parquet-mr` and  **this pr seems can protect this issue in Spark**
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

Reply via email to