Interesting opinion, thank you Still, on the website parquet is basically inspired by Dremel (Google) [1] and part of orc has been enhanced while deployed for Facebook, Yahoo [2].
Other than this presentation [3], do you guys know any other benchmark? [1]https://parquet.apache.org/documentation/latest/ <https://parquet.apache.org/documentation/latest/> [2]https://orc.apache.org/docs/ <https://orc.apache.org/docs/> [3] http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet <http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet> > On 26 Jul 2016, at 15:19, Koert Kuipers <ko...@tresata.com> wrote: > > when parquet came out it was developed by a community of companies, and was > designed as a library to be supported by multiple big data projects. nice > > orc on the other hand initially only supported hive. it wasn't even designed > as a library that can be re-used. even today it brings in the kitchen sink of > transitive dependencies. yikes > > > On Jul 26, 2016 5:09 AM, "Jörn Franke" <jornfra...@gmail.com > <mailto:jornfra...@gmail.com>> wrote: > I think both are very similar, but with slightly different goals. While they > work transparently for each Hadoop application you need to enable specific > support in the application for predicate push down. > In the end you have to check which application you are using and do some > tests (with correct predicate push down configuration). Keep in mind that > both formats work best if they are sorted on filter columns (which is your > responsibility) and if their optimatizations are correctly configured (min > max index, bloom filter, compression etc) . > > If you need to ingest sensor data you may want to store it first in hbase and > then batch process it in large files in Orc or parquet format. > > On 26 Jul 2016, at 04:09, janardhan shetty <janardhan...@gmail.com > <mailto:janardhan...@gmail.com>> wrote: > >> Just wondering advantages and disadvantages to convert data into ORC or >> Parquet. >> >> In the documentation of Spark there are numerous examples of Parquet format. >> >> Any strong reasons to chose Parquet over ORC file format ? >> >> Also : current data compression is bzip2 >> >> http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy >> >> <http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy> >> >> This seems like biased.