+0

I understand that schema pruning is an experimental feature in Spark
2.4, and this can help a lot in read performance as people are trying
to keep the hierarchical data in nested format.

We just found a serious bug---it could fail parquet reader if a nested
field and top level field are selected simultaneously.
https://issues.apache.org/jira/browse/SPARK-25879

If we decide to not fix it in 2.4, we should at least document it in
the release note to let users know.

Sincerely,

DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 0x5CED8B896A6BDFA0
On Mon, Oct 29, 2018 at 8:42 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>
> +1
>
> 2018년 10월 30일 (화) 오전 11:03, Gengliang Wang <ltn...@gmail.com>님이 작성:
>>
>> +1
>>
>> > 在 2018年10月30日,上午10:41,Sean Owen <sro...@gmail.com> 写道:
>> >
>> > +1
>> >
>> > Same result as in RC4 from me, and the issues I know of that were
>> > raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11.
>> >
>> > These items are still targeted to 2.4.0; Xiangrui I assume these
>> > should just be untargeted now, or resolved?
>> > SPARK-25584 Document libsvm data source in doc site
>> > SPARK-25346 Document Spark builtin data sources
>> > SPARK-24464 Unit tests for MLlib's Instrumentation
>> > On Mon, Oct 29, 2018 at 5:22 AM Wenchen Fan <cloud0...@gmail.com> wrote:
>> >>
>> >> Please vote on releasing the following candidate as Apache Spark version 
>> >> 2.4.0.
>> >>
>> >> The vote is open until November 1 PST and passes if a majority +1 PMC 
>> >> votes are cast, with
>> >> a minimum of 3 +1 votes.
>> >>
>> >> [ ] +1 Release this package as Apache Spark 2.4.0
>> >> [ ] -1 Do not release this package because ...
>> >>
>> >> To learn more about Apache Spark, please see http://spark.apache.org/
>> >>
>> >> The tag to be voted on is v2.4.0-rc5 (commit 
>> >> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
>> >> https://github.com/apache/spark/tree/v2.4.0-rc5
>> >>
>> >> The release files, including signatures, digests, etc. can be found at:
>> >> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>> >>
>> >> Signatures used for Spark RCs can be found in this file:
>> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>
>> >> The staging repository for this release can be found at:
>> >> https://repository.apache.org/content/repositories/orgapachespark-1291
>> >>
>> >> The documentation corresponding to this release can be found at:
>> >> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>> >>
>> >> The list of bug fixes going into 2.4.0 can be found at the following URL:
>> >> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>> >>
>> >> FAQ
>> >>
>> >> =========================
>> >> How can I help test this release?
>> >> =========================
>> >>
>> >> If you are a Spark user, you can help us test this release by taking
>> >> an existing Spark workload and running on this release candidate, then
>> >> reporting any regressions.
>> >>
>> >> If you're working in PySpark you can set up a virtual env and install
>> >> the current RC and see if anything important breaks, in the Java/Scala
>> >> you can add the staging repository to your projects resolvers and test
>> >> with the RC (make sure to clean up the artifact cache before/after so
>> >> you don't end up building with a out of date RC going forward).
>> >>
>> >> ===========================================
>> >> What should happen to JIRA tickets still targeting 2.4.0?
>> >> ===========================================
>> >>
>> >> The current list of open tickets targeted at 2.4.0 can be found at:
>> >> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> >> Version/s" = 2.4.0
>> >>
>> >> Committers should look at those and triage. Extremely important bug
>> >> fixes, documentation, and API tweaks that impact compatibility should
>> >> be worked on immediately. Everything else please retarget to an
>> >> appropriate release.
>> >>
>> >> ==================
>> >> But my bug isn't fixed?
>> >> ==================
>> >>
>> >> In order to make timely releases, we will typically not hold the
>> >> release unless the bug in question is a regression from the previous
>> >> release. That being said, if there is something which is a regression
>> >> that has not been correctly targeted please ping me or a committer to
>> >> help target the issue.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to